1 00:00:00,000 --> 00:00:01,810 2 00:00:01,810 --> 00:00:04,240 The following content is provided under a Creative 3 00:00:04,240 --> 00:00:05,650 Commons license. 4 00:00:05,650 --> 00:00:08,680 Your support will help MIT OpenCourseWare continue to 5 00:00:08,680 --> 00:00:12,340 offer high quality educational resources for free. 6 00:00:12,340 --> 00:00:15,230 To make a donation or view additional materials from 7 00:00:15,230 --> 00:00:19,160 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,160 --> 00:00:20,410 ocw.mit.edu. 9 00:00:20,410 --> 00:00:26,000 10 00:00:26,000 --> 00:00:27,220 CHARLES LEISERSON: So today we're going to take a little 11 00:00:27,220 --> 00:00:33,120 bit closer look at what's happening under the covers 12 00:00:33,120 --> 00:00:37,240 when you compile a C program. 13 00:00:37,240 --> 00:00:42,080 But before we get into that, we did a little interesting 14 00:00:42,080 --> 00:00:51,160 correlation on your scores for the first problem, for the 15 00:00:51,160 --> 00:00:52,760 every bit one. 16 00:00:52,760 --> 00:00:55,810 So this is basically plotting. 17 00:00:55,810 --> 00:01:00,350 It's a scatter plot of how you did in your test coverage 18 00:01:00,350 --> 00:01:07,410 score versus how you did in your correctness score and 19 00:01:07,410 --> 00:01:11,050 performance, correctness and performance together. 20 00:01:11,050 --> 00:01:15,270 And what's interesting is that if you did better in your test 21 00:01:15,270 --> 00:01:19,940 coverage, you did better in your performance and 22 00:01:19,940 --> 00:01:22,161 correctness. 23 00:01:22,161 --> 00:01:24,640 OK, that's a pretty good correlation, right? 24 00:01:24,640 --> 00:01:26,465 There's some outliers here. 25 00:01:26,465 --> 00:01:27,925 But that's a pretty good correlation. 26 00:01:27,925 --> 00:01:36,170 27 00:01:36,170 --> 00:01:36,890 Yeah, John? 28 00:01:36,890 --> 00:01:37,643 Yeah? 29 00:01:37,643 --> 00:01:38,106 JOHN: The-- 30 00:01:38,106 --> 00:01:39,340 CHARLES LEISERSON: Do we have a handheld? 31 00:01:39,340 --> 00:01:40,360 Here we go. 32 00:01:40,360 --> 00:01:41,610 Just a second. 33 00:01:41,610 --> 00:01:47,660 34 00:01:47,660 --> 00:01:49,610 The only thing is we have to figure out how to turn it on. 35 00:01:49,610 --> 00:01:50,860 There we go. 36 00:01:50,860 --> 00:01:52,885 37 00:01:52,885 --> 00:01:55,640 JOHN: Yeah, so just to clarify, the people who 38 00:01:55,640 --> 00:01:58,515 seemingly got no test coverage but really good performance 39 00:01:58,515 --> 00:02:03,420 scores, what actually happened was that we tested for things 40 00:02:03,420 --> 00:02:07,630 that we didn't expect students to cover for like on feeding 41 00:02:07,630 --> 00:02:12,510 invalid values to better a set and better a get or testing 42 00:02:12,510 --> 00:02:14,830 for private functions to their implementations. 43 00:02:14,830 --> 00:02:19,190 So in reality they had better test suites than the coverage 44 00:02:19,190 --> 00:02:20,440 score would indicate. 45 00:02:20,440 --> 00:02:26,970 46 00:02:26,970 --> 00:02:28,140 CHARLES LEISERSON: So what are the lessons that 47 00:02:28,140 --> 00:02:31,430 one draws from this? 48 00:02:31,430 --> 00:02:35,270 So professional engineers know what the lessons are. 49 00:02:35,270 --> 00:02:39,320 So the lessons are that it is actually better, if you have a 50 00:02:39,320 --> 00:02:44,910 coding problem to do, to write tests first. 51 00:02:44,910 --> 00:02:48,850 Before you code you write your tests. 52 00:02:48,850 --> 00:02:52,360 And that actually speeds the development of 53 00:02:52,360 --> 00:02:53,880 fast correct code. 54 00:02:53,880 --> 00:02:55,080 It's actually faster. 55 00:02:55,080 --> 00:02:58,340 You get to the end result much faster. 56 00:02:58,340 --> 00:03:02,820 Because whenever you make an error in your program, you 57 00:03:02,820 --> 00:03:07,360 instantly know that you may have a problem rather than 58 00:03:07,360 --> 00:03:10,380 thinking that you're doing something OK and then 59 00:03:10,380 --> 00:03:15,120 discovering that, oh, in fact your code is, in fact, 60 00:03:15,120 --> 00:03:18,120 incorrect, and you're working away optimizing something 61 00:03:18,120 --> 00:03:25,030 that's not working So before coding, it's highly 62 00:03:25,030 --> 00:03:26,620 recommended that you write test. 63 00:03:26,620 --> 00:03:31,740 Also if you find a bug, when you find a bug, the first 64 00:03:31,740 --> 00:03:35,840 thing you should do is write a test for that bug if it wasn't 65 00:03:35,840 --> 00:03:38,310 already covered. 66 00:03:38,310 --> 00:03:40,490 Then you fix the bug. 67 00:03:40,490 --> 00:03:43,860 And then you make sure that your test now, that your new 68 00:03:43,860 --> 00:03:46,830 implementation, passes that particular one. 69 00:03:46,830 --> 00:03:48,850 Professional engineers know this. 70 00:03:48,850 --> 00:03:52,310 Professional software developers know this. 71 00:03:52,310 --> 00:03:54,360 It comes hard. 72 00:03:54,360 --> 00:03:59,760 And if you want a job at any top flight software firm, 73 00:03:59,760 --> 00:04:02,520 they're going to expect that you know that you write tests 74 00:04:02,520 --> 00:04:04,990 first before you do coding. 75 00:04:04,990 --> 00:04:08,100 76 00:04:08,100 --> 00:04:12,030 The second lesson isn't quite so obvious from this. 77 00:04:12,030 --> 00:04:14,780 But it's the second lesson that I think some people 78 00:04:14,780 --> 00:04:20,120 experienced in the class which was the idea of putting you in 79 00:04:20,120 --> 00:04:22,800 groups, in particular in pairs, was not so that you 80 00:04:22,800 --> 00:04:26,930 could do divide and conquer on the code. 81 00:04:26,930 --> 00:04:29,550 It was to do pair programming. 82 00:04:29,550 --> 00:04:33,380 And what we found was that a bunch of groups 83 00:04:33,380 --> 00:04:35,540 divided up the work. 84 00:04:35,540 --> 00:04:38,150 And they said, OK, it'll go faster if you do this one and 85 00:04:38,150 --> 00:04:39,800 I do that one. 86 00:04:39,800 --> 00:04:42,810 Once again that's probably a mistake. 87 00:04:42,810 --> 00:04:47,010 If you can sit together and take turns at the keyboard 88 00:04:47,010 --> 00:04:49,890 making the changes, it may seem like it's going slower to 89 00:04:49,890 --> 00:04:53,580 begin with, but it's amazing how many errors you catch and 90 00:04:53,580 --> 00:04:57,890 how quickly you find your errors because you're just 91 00:04:57,890 --> 00:04:59,560 talking with each other. 92 00:04:59,560 --> 00:05:02,990 And it's like, oh, duh. 93 00:05:02,990 --> 00:05:06,860 So good programmers know this. 94 00:05:06,860 --> 00:05:10,020 That it really helps to have more than one person 95 00:05:10,020 --> 00:05:12,560 understand what's going on in the code. 96 00:05:12,560 --> 00:05:15,580 So the people who had difficulty with their partners 97 00:05:15,580 --> 00:05:19,660 one way or another often did not, it was partly because 98 00:05:19,660 --> 00:05:22,560 they just divided up the work, you're responsible for that, 99 00:05:22,560 --> 00:05:25,290 oh, we got a bad grade on that, that's your fault. 100 00:05:25,290 --> 00:05:29,540 No, both partners own that grade 100%. 101 00:05:29,540 --> 00:05:35,220 And the best way to ensure is to work together. 102 00:05:35,220 --> 00:05:39,060 Now, this sometimes flies in the face of people who believe 103 00:05:39,060 --> 00:05:42,440 that they are clever or more experienced than somebody, 104 00:05:42,440 --> 00:05:47,720 than their partner, oh, I can do this much better on my own. 105 00:05:47,720 --> 00:05:51,370 Usually, that's true for little projects, but as the 106 00:05:51,370 --> 00:05:59,390 projects get bigger that becomes a much harder 107 00:05:59,390 --> 00:06:01,350 situation to deal with. 108 00:06:01,350 --> 00:06:04,980 It becomes the case that you really want two brains looking 109 00:06:04,980 --> 00:06:09,100 at the same thing, four eyes as opposed to two 110 00:06:09,100 --> 00:06:12,310 eyes looking at things. 111 00:06:12,310 --> 00:06:16,060 But I think, in particular, before coding write test. 112 00:06:16,060 --> 00:06:18,320 And we are right now working on improving the 113 00:06:18,320 --> 00:06:19,620 infrastructure. 114 00:06:19,620 --> 00:06:24,940 One of the things that they have in most companies is, at 115 00:06:24,940 --> 00:06:29,510 the very minimum, they have what's called a nightly build. 116 00:06:29,510 --> 00:06:32,710 Nightly build says they take all the software, they build 117 00:06:32,710 --> 00:06:35,800 it, and then they run regression tests against it 118 00:06:35,800 --> 00:06:37,990 all night while everybody's home sleeping. 119 00:06:37,990 --> 00:06:40,480 Come in the next morning, here's the things that broke. 120 00:06:40,480 --> 00:06:44,120 And if you broke the build, you got some 121 00:06:44,120 --> 00:06:46,760 work to do that morning. 122 00:06:46,760 --> 00:06:50,260 And it's generally not a good idea to break the build. 123 00:06:50,260 --> 00:06:54,050 What has been demonstrating, in fact, is that continuous 124 00:06:54,050 --> 00:06:56,520 build is even better. 125 00:06:56,520 --> 00:06:58,940 This is where, whenever you make a change to the program, 126 00:06:58,940 --> 00:07:02,250 you run the full suite of tests on it. 127 00:07:02,250 --> 00:07:04,160 And we're going to look into it. 128 00:07:04,160 --> 00:07:05,830 We have to see what our resources are. 129 00:07:05,830 --> 00:07:09,660 As you know, our TAs are a limited resource. 130 00:07:09,660 --> 00:07:11,820 But we're going to look into seeing whether we can provide 131 00:07:11,820 --> 00:07:14,030 more of that kind of infrastructure on some of the 132 00:07:14,030 --> 00:07:15,940 later projects for you folks. 133 00:07:15,940 --> 00:07:18,190 So you can sort of see the matrix that we 134 00:07:18,190 --> 00:07:19,370 eventually got to you. 135 00:07:19,370 --> 00:07:21,920 You can see that develop in real time. 136 00:07:21,920 --> 00:07:24,650 How am I doing against other people's tests? 137 00:07:24,650 --> 00:07:26,740 How are they doing against my tests, et cetera? 138 00:07:26,740 --> 00:07:29,600 139 00:07:29,600 --> 00:07:31,410 So we'll see whether we can do that. 140 00:07:31,410 --> 00:07:34,810 But what's funny is you think that it'd be faster to just 141 00:07:34,810 --> 00:07:35,950 code and do it. 142 00:07:35,950 --> 00:07:39,120 Computer science is full of wonderful paradoxes. 143 00:07:39,120 --> 00:07:44,850 And one of them is that doing things like writing the extra 144 00:07:44,850 --> 00:07:51,120 code to test is actually faster than not writing it, 145 00:07:51,120 --> 00:07:52,650 surprisingly. 146 00:07:52,650 --> 00:07:55,460 It really gets you to the end result a lot faster. 147 00:07:55,460 --> 00:07:56,710 Any questions about that? 148 00:07:56,710 --> 00:07:59,200 149 00:07:59,200 --> 00:08:00,450 Any comments about that? 150 00:08:00,450 --> 00:08:03,410 151 00:08:03,410 --> 00:08:08,330 Let's talk about our today. 152 00:08:08,330 --> 00:08:11,000 So today we're going to talk mostly about single threaded 153 00:08:11,000 --> 00:08:12,990 performance. 154 00:08:12,990 --> 00:08:15,020 This is one instruction stream that you're 155 00:08:15,020 --> 00:08:16,970 trying to make go fast. 156 00:08:16,970 --> 00:08:23,620 But if you look at today's computing milieu, how all of 157 00:08:23,620 --> 00:08:26,380 the computers are used, what do you have? 158 00:08:26,380 --> 00:08:29,650 You've got networks of multi-core clusters. 159 00:08:29,650 --> 00:08:32,270 It's parallelism everywhere. 160 00:08:32,270 --> 00:08:35,950 You've got shared memory among processors within a chip. 161 00:08:35,950 --> 00:08:39,409 You've got message passing among machines in a cluster. 162 00:08:39,409 --> 00:08:42,210 You've got network protocols among clusters so that you can 163 00:08:42,210 --> 00:08:47,400 do wide area things. 164 00:08:47,400 --> 00:08:50,620 Yet we're saying, no, let's take a look at what happens on 165 00:08:50,620 --> 00:08:54,320 one core on one machine. 166 00:08:54,320 --> 00:08:59,920 So why is that important to focus first on what 167 00:08:59,920 --> 00:09:02,590 one core can do? 168 00:09:02,590 --> 00:09:05,530 Why study single threaded performance at all? 169 00:09:05,530 --> 00:09:06,920 Let's just go do the parallel stuff. 170 00:09:06,920 --> 00:09:08,170 That's more fun anyway. 171 00:09:08,170 --> 00:09:10,820 172 00:09:10,820 --> 00:09:13,630 Well, there are a couple of reasons that I can think of. 173 00:09:13,630 --> 00:09:22,010 The first one is at the end of the day, even if you've got 174 00:09:22,010 --> 00:09:27,230 something running widely in parallel, the code is running 175 00:09:27,230 --> 00:09:29,560 in each core in a single threaded manner. 176 00:09:29,560 --> 00:09:32,050 You just have a bunch of them. 177 00:09:32,050 --> 00:09:36,600 And so if you've given up a factor of two or a factor of 178 00:09:36,600 --> 00:09:40,690 four in performance or even more, as you're aware, you can 179 00:09:40,690 --> 00:09:44,240 sometimes make it orders of magnitude, but in performance 180 00:09:44,240 --> 00:09:47,010 what you're saying is that you're going to end up using 181 00:09:47,010 --> 00:09:52,060 much more resources to do your particular job in parallel. 182 00:09:52,060 --> 00:09:53,690 And resources is money. 183 00:09:53,690 --> 00:10:03,360 So if I can do the job with a cluster of 16 processors and 184 00:10:03,360 --> 00:10:06,000 somebody else can do it in a cluster with only four 185 00:10:06,000 --> 00:10:12,360 processors, hey, they just spent a quarter the amount on 186 00:10:12,360 --> 00:10:17,080 not just the capital investment in that hardware 187 00:10:17,080 --> 00:10:21,720 but also the operating costs of what it cost to actually 188 00:10:21,720 --> 00:10:27,040 cool and provide electricity to and maintain and so forth. 189 00:10:27,040 --> 00:10:29,050 All that gets much cheaper. 190 00:10:29,050 --> 00:10:31,100 So if you get good single thread performance, it 191 00:10:31,100 --> 00:10:31,730 translates. 192 00:10:31,730 --> 00:10:36,300 That's kind of the direct reason. 193 00:10:36,300 --> 00:10:40,350 The indirect reason is, for studying it, is that many of 194 00:10:40,350 --> 00:10:42,160 the lessons will generalize. 195 00:10:42,160 --> 00:10:44,310 So things that we'll see for single core, there is an 196 00:10:44,310 --> 00:10:47,890 analogy when you start looking at parallel 197 00:10:47,890 --> 00:10:50,640 and distributed systems. 198 00:10:50,640 --> 00:10:53,320 So that's a little less concrete. 199 00:10:53,320 --> 00:10:57,360 But as you'll see as you gain experience, you'll see that 200 00:10:57,360 --> 00:11:00,980 there's a lot of lessons that generalize to how do you think 201 00:11:00,980 --> 00:11:05,715 about performance no matter what the context. 202 00:11:05,715 --> 00:11:09,660 203 00:11:09,660 --> 00:11:12,860 So what about a single threaded machine? 204 00:11:12,860 --> 00:11:14,130 What's it like? 205 00:11:14,130 --> 00:11:16,370 So some of this is going to be a little bit review, but we're 206 00:11:16,370 --> 00:11:17,590 going to just sort of go deeper. 207 00:11:17,590 --> 00:11:20,330 We've sort of been taking layers off the onion. 208 00:11:20,330 --> 00:11:22,210 And today we're going to take a few more 209 00:11:22,210 --> 00:11:23,960 layers off the onion. 210 00:11:23,960 --> 00:11:26,680 So you have inside a processor core. 211 00:11:26,680 --> 00:11:28,340 You've got registers. 212 00:11:28,340 --> 00:11:30,930 You've got the functional units to do your ALU 213 00:11:30,930 --> 00:11:34,880 operations, floating point units, vector units these 214 00:11:34,880 --> 00:11:39,760 days, and all the stuff to do instruction, execution, and 215 00:11:39,760 --> 00:11:42,300 coordination, scheduling, out of order 216 00:11:42,300 --> 00:11:45,100 execution, and so forth. 217 00:11:45,100 --> 00:11:48,510 In addition then, you have a memory hierarchy. 218 00:11:48,510 --> 00:11:50,780 Within the core, you typically have registers 219 00:11:50,780 --> 00:11:52,520 and L1 and L2 caches. 220 00:11:52,520 --> 00:11:57,910 And then outside the core often is the L3 cache. 221 00:11:57,910 --> 00:12:00,480 DRAM memory, you may have a solid-state drive 222 00:12:00,480 --> 00:12:02,780 these days and disk. 223 00:12:02,780 --> 00:12:05,240 And so in that context, you're trying to make 224 00:12:05,240 --> 00:12:06,490 your code run fast. 225 00:12:06,490 --> 00:12:09,540 226 00:12:09,540 --> 00:12:15,100 So when you compile the piece of code, so here I have a 227 00:12:15,100 --> 00:12:15,660 piece of code. 228 00:12:15,660 --> 00:12:19,370 I'm always amused when I put up a Fibonacci as the example. 229 00:12:19,370 --> 00:12:23,210 Because this is a really terrible way to compute 230 00:12:23,210 --> 00:12:25,580 Fibonacci numbers. 231 00:12:25,580 --> 00:12:28,530 So this is an exponential time algorithm for computing 232 00:12:28,530 --> 00:12:30,120 Fibonacci numbers. 233 00:12:30,120 --> 00:12:35,100 And you may be aware you can do this in linear time just by 234 00:12:35,100 --> 00:12:36,910 adding up from the bottom. 235 00:12:36,910 --> 00:12:39,070 In fact, if you take the algorithms course, you learn 236 00:12:39,070 --> 00:12:44,220 that you can actually do this in logarithmic time by matrix, 237 00:12:44,220 --> 00:12:46,770 recursive squaring of matrices. 238 00:12:46,770 --> 00:12:49,510 So it's sort of interesting to put up something where we say 239 00:12:49,510 --> 00:12:52,100 we're going to optimize this. 240 00:12:52,100 --> 00:12:54,570 And, of course, we'll get a constant factor improvement on 241 00:12:54,570 --> 00:12:55,800 something like this. 242 00:12:55,800 --> 00:13:01,610 But, in fact, really this is a terrible program to write for 243 00:13:01,610 --> 00:13:02,380 optimization. 244 00:13:02,380 --> 00:13:04,720 But it's good didactically. 245 00:13:04,720 --> 00:13:07,460 And Fibonacci numbers are fun anyway. 246 00:13:07,460 --> 00:13:13,120 So typically what happens is when you run GCC on your .C 247 00:13:13,120 --> 00:13:19,980 file and produce a binary, what happens is it produces 248 00:13:19,980 --> 00:13:23,010 the machine code, which is basically a string of bytes, 249 00:13:23,010 --> 00:13:24,810 zeros and ones. 250 00:13:24,810 --> 00:13:28,570 And that goes, when you run the program, that goes through 251 00:13:28,570 --> 00:13:30,650 the hardware interpreter. 252 00:13:30,650 --> 00:13:33,430 So the hardware of the machine is doing an interpretation of 253 00:13:33,430 --> 00:13:37,150 these very simple instructions and produces an execution. 254 00:13:37,150 --> 00:13:40,980 But, in fact, there's actually four stages that go on inside 255 00:13:40,980 --> 00:13:44,710 of GCC if you type a command like this. 256 00:13:44,710 --> 00:13:47,320 The first thing is what's called preprocessing. 257 00:13:47,320 --> 00:13:51,360 And what that does is it does any macro expansion and so 258 00:13:51,360 --> 00:13:55,280 forth, things that are just basically on the level of 259 00:13:55,280 --> 00:13:57,800 textual substitutions before you get into 260 00:13:57,800 --> 00:13:59,210 the guts of the compiler. 261 00:13:59,210 --> 00:14:02,310 Then you actually do the compiler. 262 00:14:02,310 --> 00:14:07,930 And that produces a version of machine code called assembly 263 00:14:07,930 --> 00:14:11,140 language, which we'll see in just a minute. 264 00:14:11,140 --> 00:14:16,940 And from that version of assembly language it then goes 265 00:14:16,940 --> 00:14:21,390 into a process called linking and loading, which actually 266 00:14:21,390 --> 00:14:24,280 causes it to produce the binary that 267 00:14:24,280 --> 00:14:27,510 you can then execute. 268 00:14:27,510 --> 00:14:30,350 So all four stages are included here. 269 00:14:30,350 --> 00:14:36,070 And there are switches to GCC that let you do only one or 270 00:14:36,070 --> 00:14:37,430 all of these things. 271 00:14:37,430 --> 00:14:40,360 You can, for example, run the preprocessor GCC. 272 00:14:40,360 --> 00:14:43,050 You can tell it to run the preprocessor alone and see 273 00:14:43,050 --> 00:14:46,180 what all your macros expanded to. 274 00:14:46,180 --> 00:14:46,410 Yes? 275 00:14:46,410 --> 00:14:47,265 Question? 276 00:14:47,265 --> 00:14:48,477 AUDIENCE: What's the difference between compiling 277 00:14:48,477 --> 00:14:50,180 and assembling? 278 00:14:50,180 --> 00:14:53,000 CHARLES LEISERSON: So compiling reduces it to 279 00:14:53,000 --> 00:14:55,090 essentially assembly language. 280 00:14:55,090 --> 00:15:00,900 And then assembling is taking that assembly language and 281 00:15:00,900 --> 00:15:04,713 producing the machine binary. 282 00:15:04,713 --> 00:15:06,162 AUDIENCE: I was going to say, there's a one-to-one 283 00:15:06,162 --> 00:15:09,060 correspondence between machine code and assembly, but there's 284 00:15:09,060 --> 00:15:10,750 not a one-to-one correspondence between C code 285 00:15:10,750 --> 00:15:11,958 and assembly. 286 00:15:11,958 --> 00:15:12,450 CHARLES LEISERSON: Yeah. 287 00:15:12,450 --> 00:15:14,850 So there's actually not quite a one to one, 288 00:15:14,850 --> 00:15:17,020 but it's very close. 289 00:15:17,020 --> 00:15:19,060 It's very close. 290 00:15:19,060 --> 00:15:21,720 So you can think of it as one to one between assembly 291 00:15:21,720 --> 00:15:22,180 machine code. 292 00:15:22,180 --> 00:15:26,390 But assembly is, in some sense, a more human readable 293 00:15:26,390 --> 00:15:29,890 and understandable version of machine code. 294 00:15:29,890 --> 00:15:31,760 In fact, that's what we're going to talk about. 295 00:15:31,760 --> 00:15:34,210 So let's go directly to assembly code. 296 00:15:34,210 --> 00:15:38,530 To do that I can use the minus S switch. 297 00:15:38,530 --> 00:15:39,970 Now it turns out it's also helpful to 298 00:15:39,970 --> 00:15:41,680 use the minus G switch. 299 00:15:41,680 --> 00:15:46,840 Minus G says give me all the debugger symbol tables. 300 00:15:46,840 --> 00:15:48,880 And what that makes it is so that you can actually read the 301 00:15:48,880 --> 00:15:50,360 assembly language. 302 00:15:50,360 --> 00:15:52,980 If you don't have that information, then you don't 303 00:15:52,980 --> 00:15:56,560 know what the programmer wrote as the symbols. 304 00:15:56,560 --> 00:16:03,050 Instead you will get computer generated 305 00:16:03,050 --> 00:16:04,620 symbol names for things. 306 00:16:04,620 --> 00:16:07,180 And you don't have any meaning to those. 307 00:16:07,180 --> 00:16:08,900 So it's really a good idea to use minus 308 00:16:08,900 --> 00:16:10,800 G and minus S together. 309 00:16:10,800 --> 00:16:13,700 And this basically provides a convenient symbolic 310 00:16:13,700 --> 00:16:15,340 representation of the machine language. 311 00:16:15,340 --> 00:16:17,730 And this is sort of the type of thing that you'll get, 312 00:16:17,730 --> 00:16:20,080 something coming out that looks like this. 313 00:16:20,080 --> 00:16:21,800 It's basically an Ascii. 314 00:16:21,800 --> 00:16:26,140 It's in text, characters, rather than being in the 315 00:16:26,140 --> 00:16:28,710 binary executable. 316 00:16:28,710 --> 00:16:34,770 And if you want, you can find out all the vagaries of it. 317 00:16:34,770 --> 00:16:40,690 This is one site that has some reasonable documentation on 318 00:16:40,690 --> 00:16:42,560 the GNU assembler. 319 00:16:42,560 --> 00:16:45,950 It's actually not as good on the instructions, but it's 320 00:16:45,950 --> 00:16:49,330 really good on all the directives, which we'll talk 321 00:16:49,330 --> 00:16:53,730 about in a minute like .global and .type and all that stuff. 322 00:16:53,730 --> 00:16:54,980 It's very good on that stuff. 323 00:16:54,980 --> 00:16:58,520 324 00:16:58,520 --> 00:17:00,260 There's another thing that you can do. 325 00:17:00,260 --> 00:17:04,220 And once again, it's also helpful if you've produced a 326 00:17:04,220 --> 00:17:06,569 binary that has the symbol table. 327 00:17:06,569 --> 00:17:11,760 And that is to do a dump of the object code. 328 00:17:11,760 --> 00:17:16,030 And when you do a dump of the object code, what it does is 329 00:17:16,030 --> 00:17:19,079 you basically give it an executable and it goes 330 00:17:19,079 --> 00:17:24,020 backwards the other way, take this executable and undo one 331 00:17:24,020 --> 00:17:27,800 step, disassemble it. 332 00:17:27,800 --> 00:17:31,150 And what's good about object dump is that it gives you, 333 00:17:31,150 --> 00:17:33,710 first of all, these are all the byte codes of the 334 00:17:33,710 --> 00:17:34,990 instructions. 335 00:17:34,990 --> 00:17:38,750 Also if you've got the minus S says interleave the source 336 00:17:38,750 --> 00:17:42,060 code, so you can see, here's the source code interleaved. 337 00:17:42,060 --> 00:17:44,070 So you can see which regions of code 338 00:17:44,070 --> 00:17:47,280 depend on which things. 339 00:17:47,280 --> 00:17:50,600 And so it basically tells you where in memory it's being 340 00:17:50,600 --> 00:17:53,970 loaded, it's been loaded, what the instructions are. 341 00:17:53,970 --> 00:17:57,560 And then it gives you the assembly interpretation of 342 00:17:57,560 --> 00:17:59,230 that machine binary. 343 00:17:59,230 --> 00:18:02,600 And this is where you can see it's almost one to one what's 344 00:18:02,600 --> 00:18:03,180 going on here. 345 00:18:03,180 --> 00:18:05,295 Here we have a push of an operand. 346 00:18:05,295 --> 00:18:07,620 And that notice is just a one byte code. 347 00:18:07,620 --> 00:18:13,330 Whereas here we've got an opcode and two arguments. 348 00:18:13,330 --> 00:18:16,690 And it has three bytes as it turns out. 349 00:18:16,690 --> 00:18:18,640 So you can see there's sort of a correspondence. 350 00:18:18,640 --> 00:18:20,208 Yeah, question? 351 00:18:20,208 --> 00:18:21,196 AUDIENCE: How does [? logic then take the ?] 352 00:18:21,196 --> 00:18:24,160 machine language code and go to-- 353 00:18:24,160 --> 00:18:27,124 how does it know the function names and stuff? 354 00:18:27,124 --> 00:18:28,820 CHARLES LEISERSON: It knows the function names because 355 00:18:28,820 --> 00:18:33,110 when you compile it with -g, it produces, in addition to 356 00:18:33,110 --> 00:18:36,250 producing the binary, it produces a separate segment 357 00:18:36,250 --> 00:18:39,870 that's not loaded in that has all that information that 358 00:18:39,870 --> 00:18:43,030 says, oh, at this location is where this symbol is. 359 00:18:43,030 --> 00:18:46,840 And it produces all that as stuff that's never loaded in 360 00:18:46,840 --> 00:18:49,700 at run time but which is there in order to 361 00:18:49,700 --> 00:18:51,290 aid debuggers mainly. 362 00:18:51,290 --> 00:18:52,610 Question? 363 00:18:52,610 --> 00:18:56,370 AUDIENCE: To compile something not using the gflag and then 364 00:18:56,370 --> 00:18:58,715 you do an object dump, how would that work? 365 00:18:58,715 --> 00:19:01,430 CHARLES LEISERSON: Then what happens is, first of all, you 366 00:19:01,430 --> 00:19:06,280 would not be able to get this stuff interleaved. 367 00:19:06,280 --> 00:19:09,720 And then things like here where it says fib, well, fib 368 00:19:09,720 --> 00:19:14,160 may be an external name so you might know it anyway. 369 00:19:14,160 --> 00:19:16,970 But if it were an internal name, you would not be able to 370 00:19:16,970 --> 00:19:18,800 see what it was. 371 00:19:18,800 --> 00:19:21,360 Yeah, if you're going to respond let's 372 00:19:21,360 --> 00:19:22,660 get you on mike here. 373 00:19:22,660 --> 00:19:24,920 Why don't you just hold this? 374 00:19:24,920 --> 00:19:27,270 JOHN: Yeah, so you'll generally get the function 375 00:19:27,270 --> 00:19:30,370 names so you know roughly a huge blob of assembly 376 00:19:30,370 --> 00:19:32,100 corresponds to a function. 377 00:19:32,100 --> 00:19:34,740 But you won't be able to get any information about what 378 00:19:34,740 --> 00:19:39,520 variables are in which registers or what position the 379 00:19:39,520 --> 00:19:41,630 sixth line of assembly corresponds to in terms of 380 00:19:41,630 --> 00:19:42,880 your source code. 381 00:19:42,880 --> 00:19:45,580 382 00:19:45,580 --> 00:19:47,170 CHARLES LEISERSON: Then the other thing that you can do is 383 00:19:47,170 --> 00:19:50,690 you can actually take the assembler, the assembly code, 384 00:19:50,690 --> 00:19:53,140 if you produce just the assembly code, and if you tell 385 00:19:53,140 --> 00:19:57,540 GCC to take a .s file, which is the assembly code, you can 386 00:19:57,540 --> 00:20:00,146 produce the machine code from it. 387 00:20:00,146 --> 00:20:05,910 And so one thing that you can do is you can produce a .s 388 00:20:05,910 --> 00:20:11,820 file and then edit it in Emacs or VI or whatever your 389 00:20:11,820 --> 00:20:16,180 favorite text editor is and then assemble it with GCC. 390 00:20:16,180 --> 00:20:18,430 So you can actually modify what the 391 00:20:18,430 --> 00:20:21,480 machine code is directly. 392 00:20:21,480 --> 00:20:23,220 And that's what we're going to spend a little bit of time 393 00:20:23,220 --> 00:20:24,580 doing today. 394 00:20:24,580 --> 00:20:28,220 Let's go in and see what the compiler generates and then 395 00:20:28,220 --> 00:20:29,470 let's twiddle it a bit. 396 00:20:29,470 --> 00:20:34,630 397 00:20:34,630 --> 00:20:37,760 So here's what we're going to expect that you do, that 398 00:20:37,760 --> 00:20:39,010 you're able to do. 399 00:20:39,010 --> 00:20:41,770 400 00:20:41,770 --> 00:20:45,320 We expect in this class that you're going to be able to 401 00:20:45,320 --> 00:20:50,370 understand how a compiler implements the C linguistic 402 00:20:50,370 --> 00:20:52,575 constructs using x86 instructions. 403 00:20:52,575 --> 00:20:55,320 404 00:20:55,320 --> 00:21:00,060 We're going to expect that you can read x86 assembly language 405 00:21:00,060 --> 00:21:01,520 with the aid of a manual. 406 00:21:01,520 --> 00:21:03,830 We don't expect that you know all the instructions, but the 407 00:21:03,830 --> 00:21:08,350 basic ones we expect that you know what those are. 408 00:21:08,350 --> 00:21:10,370 We expect that you're going to be able to make simple 409 00:21:10,370 --> 00:21:14,840 modifications to the assembly language generated by a 410 00:21:14,840 --> 00:21:19,200 compiler, and that you would know, if push came to shove, 411 00:21:19,200 --> 00:21:21,610 how to write your own machine code on your own. 412 00:21:21,610 --> 00:21:23,700 That's not something we're going to expect that you do, 413 00:21:23,700 --> 00:21:26,920 but you would know how to get started to do that if at some 414 00:21:26,920 --> 00:21:28,310 point you said, oh, I really have to 415 00:21:28,310 --> 00:21:29,560 write this in assembler. 416 00:21:29,560 --> 00:21:32,410 417 00:21:32,410 --> 00:21:35,150 So this is, as I say, really we're going to take off some 418 00:21:35,150 --> 00:21:40,036 layers of the onion today, try to get down what's going on. 419 00:21:40,036 --> 00:21:44,420 It turns out this is actually kind of fun. 420 00:21:44,420 --> 00:21:49,100 Now, the part that's not fun at some level is the x86 64 421 00:21:49,100 --> 00:21:50,350 machine model. 422 00:21:50,350 --> 00:21:53,670 423 00:21:53,670 --> 00:21:55,680 The x86 is what's called a complex 424 00:21:55,680 --> 00:21:59,530 instruction set computer. 425 00:21:59,530 --> 00:22:06,590 And these, long ago, were demonstrated to be inferior to 426 00:22:06,590 --> 00:22:10,940 so-called reduced instruction set computers. 427 00:22:10,940 --> 00:22:14,050 But that hasn't mattered in the marketplace. 428 00:22:14,050 --> 00:22:16,290 What's mattered in the marketplace is who could build 429 00:22:16,290 --> 00:22:18,240 better and faster chips. 430 00:22:18,240 --> 00:22:22,940 And also the amount of people who started using the x86 431 00:22:22,940 --> 00:22:28,240 instruction set has produced a huge legacy and inertia. 432 00:22:28,240 --> 00:22:30,860 It's sort of like some people might argue that Esperanto is 433 00:22:30,860 --> 00:22:32,140 a better language for everybody 434 00:22:32,140 --> 00:22:34,480 to learn than English. 435 00:22:34,480 --> 00:22:38,350 But how come English with all its complexities and so forth, 436 00:22:38,350 --> 00:22:42,500 and I'm sure for some of you who have learned English as a 437 00:22:42,500 --> 00:22:45,370 second language, it's like it's a crazy language. 438 00:22:45,370 --> 00:22:46,640 Who do you learn English? 439 00:22:46,640 --> 00:22:50,040 Well, it's because that's what everybody's learning. 440 00:22:50,040 --> 00:22:52,560 That's where the legacy is. 441 00:22:52,560 --> 00:22:57,130 And so x86 is very much like the English of 442 00:22:57,130 --> 00:22:58,920 machines these days. 443 00:22:58,920 --> 00:23:05,030 So in this model there's basically a flat 64-bit 444 00:23:05,030 --> 00:23:06,280 address space. 445 00:23:06,280 --> 00:23:09,510 446 00:23:09,510 --> 00:23:14,710 There are 16 64-bit general purpose registers, and then 447 00:23:14,710 --> 00:23:18,800 what are some segment registers, a register full of 448 00:23:18,800 --> 00:23:27,910 flags, an instruction pointer register, rest in peace. 449 00:23:27,910 --> 00:23:33,170 They're eight 80-bit floating point data registers, some 450 00:23:33,170 --> 00:23:37,850 control status registers, an opcode register, a floating 451 00:23:37,850 --> 00:23:40,540 point instruction pointer register, and a floating point 452 00:23:40,540 --> 00:23:47,000 data pointing register, some MMX registers for the 453 00:23:47,000 --> 00:23:56,770 multimedia extensions, and a 128-bit XMM registers for the 454 00:23:56,770 --> 00:24:01,500 SSE instructions, which are the ability to have an opcode 455 00:24:01,500 --> 00:24:05,040 run over several pieces of data at once, short vectors, 456 00:24:05,040 --> 00:24:09,440 vector instructions, and a 32-bit register that frankly I 457 00:24:09,440 --> 00:24:13,410 don't have a clue as to what it does. 458 00:24:13,410 --> 00:24:16,010 So, fortunately, we don't have to know all these. 459 00:24:16,010 --> 00:24:17,930 You can look at the architecture manual if any of 460 00:24:17,930 --> 00:24:20,250 these become important. 461 00:24:20,250 --> 00:24:24,170 So our goal is not to memorize the x86 instruction set. 462 00:24:24,170 --> 00:24:30,390 That would be a punishment probably worse than death. 463 00:24:30,390 --> 00:24:33,175 The only thing worse would be learning all of C++. 464 00:24:33,175 --> 00:24:36,960 465 00:24:36,960 --> 00:24:40,800 So here's the general registers. 466 00:24:40,800 --> 00:24:44,260 So there are basically 64-bit registers. 467 00:24:44,260 --> 00:24:46,090 And here's the mnemonics that they have. 468 00:24:46,090 --> 00:24:48,370 So you can see is all very mnemonic, right? 469 00:24:48,370 --> 00:24:50,110 We got some of them that are numbered. 470 00:24:50,110 --> 00:24:51,560 How come they're all just not numbered? 471 00:24:51,560 --> 00:24:53,670 I mean come on, right? 472 00:24:53,670 --> 00:24:54,150 I know why. 473 00:24:54,150 --> 00:24:54,880 I know why. 474 00:24:54,880 --> 00:24:56,130 Don't tell me. 475 00:24:56,130 --> 00:24:58,660 476 00:24:58,660 --> 00:25:01,740 So what you get to do is look at and remember that there are 477 00:25:01,740 --> 00:25:06,240 all these fun registers. 478 00:25:06,240 --> 00:25:11,520 And what they did is the x86 64 architecture grew out of 479 00:25:11,520 --> 00:25:14,730 the x86 which was 32-bit. 480 00:25:14,730 --> 00:25:18,340 Well, in fact, originally it was 16-bit. 481 00:25:18,340 --> 00:25:21,930 And it's been extended twice to have more bits in the 482 00:25:21,930 --> 00:25:23,580 instruction word so that now it's a 483 00:25:23,580 --> 00:25:25,730 64-bit instruction word. 484 00:25:25,730 --> 00:25:28,620 And what they did in order to make it so that they could run 485 00:25:28,620 --> 00:25:32,750 legacy code more easily, which might have been written with a 486 00:25:32,750 --> 00:25:37,080 smaller word size, is they've overlap so that the EAX 487 00:25:37,080 --> 00:25:39,370 register, for example, is the low order 488 00:25:39,370 --> 00:25:43,740 32-bits of the RAX register. 489 00:25:43,740 --> 00:25:47,320 So what you do is you'll see that R is the prefix that 490 00:25:47,320 --> 00:25:52,182 says, hey, that's a 64-bit register. 491 00:25:52,182 --> 00:25:56,380 E is the prefix that says that it is-- 492 00:25:56,380 --> 00:25:57,920 Whoops, I made a mistake there. 493 00:25:57,920 --> 00:25:59,250 Those should all be Es. 494 00:25:59,250 --> 00:26:03,530 495 00:26:03,530 --> 00:26:05,200 Oh, no, sorry, no, there's are correct. 496 00:26:05,200 --> 00:26:08,310 These are R and then with D because these are the 497 00:26:08,310 --> 00:26:09,760 extended ones, yes. 498 00:26:09,760 --> 00:26:17,410 So these are D. So R and D, that means also that it's 16. 499 00:26:17,410 --> 00:26:20,280 So you can see just how easy this is to remember without a 500 00:26:20,280 --> 00:26:21,670 cheat sheet, right? 501 00:26:21,670 --> 00:26:23,540 And then you go down to 15, et cetera. 502 00:26:23,540 --> 00:26:25,980 And so you can go all the way down to byte naming, the low 503 00:26:25,980 --> 00:26:28,510 order byte of the registers. 504 00:26:28,510 --> 00:26:31,450 In addition, it turns out that that's not all. 505 00:26:31,450 --> 00:26:36,150 But the high order byte of the 16-bit registers are also 506 00:26:36,150 --> 00:26:40,365 available as independently named registers. 507 00:26:40,365 --> 00:26:45,830 508 00:26:45,830 --> 00:26:49,590 When you're using this in a C program, there's a convention 509 00:26:49,590 --> 00:26:50,600 that C has. 510 00:26:50,600 --> 00:26:54,570 And it's actually different on Windows from on Linux. 511 00:26:54,570 --> 00:26:57,450 Because there's no reason they should make those things 512 00:26:57,450 --> 00:26:58,010 compatible. 513 00:26:58,010 --> 00:27:01,040 That would be too easy. 514 00:27:01,040 --> 00:27:02,460 So instead they have different ones. 515 00:27:02,460 --> 00:27:07,480 But the ones on Linux, this is essentially the structure. 516 00:27:07,480 --> 00:27:10,810 What happens when you call a subroutine is generally you're 517 00:27:10,810 --> 00:27:15,250 passing the arguments to the subroutine in registers. 518 00:27:15,250 --> 00:27:17,490 And in fact the first six arguments are 519 00:27:17,490 --> 00:27:19,360 passed in these registers. 520 00:27:19,360 --> 00:27:22,790 RDI, you'll get very familiar with RDI. 521 00:27:22,790 --> 00:27:26,660 Because that's where the first argument is always passed. 522 00:27:26,660 --> 00:27:30,530 And almost all your functions will have a first argument, 523 00:27:30,530 --> 00:27:33,880 except for the ones that have no arguments, and then the 524 00:27:33,880 --> 00:27:36,080 second arguments, the third, and so forth, and 525 00:27:36,080 --> 00:27:37,890 then fifth and sixth. 526 00:27:37,890 --> 00:27:41,550 If you get more than six, then it turns out, then you start 527 00:27:41,550 --> 00:27:44,120 passing arguments through memory. 528 00:27:44,120 --> 00:27:47,120 But otherwise the convention is that the arguments are 529 00:27:47,120 --> 00:27:49,650 passed through registers. 530 00:27:49,650 --> 00:27:52,220 There are a couple of other important registers. 531 00:27:52,220 --> 00:27:59,320 One here is the return value always comes back in RAX. 532 00:27:59,320 --> 00:28:03,840 So when a function returns, boom, that's where, RAX is 533 00:28:03,840 --> 00:28:07,210 where the value of the return is. 534 00:28:07,210 --> 00:28:11,440 There is a base pointer and a stack pointer which give you 535 00:28:11,440 --> 00:28:15,900 the stack frame so that when you do a push and want to push 536 00:28:15,900 --> 00:28:18,880 local variables those are telling you the limits of your 537 00:28:18,880 --> 00:28:19,520 local variats. 538 00:28:19,520 --> 00:28:21,780 And we'll talk more about that a little bit. 539 00:28:21,780 --> 00:28:23,390 And then there are a variety of other ones. 540 00:28:23,390 --> 00:28:28,880 Some are callee saved and some are caller saved. 541 00:28:28,880 --> 00:28:30,880 And you can refer to this chart. 542 00:28:30,880 --> 00:28:36,000 And there are others similar to it in the various manuals. 543 00:28:36,000 --> 00:28:39,730 Now, it gets pretty confusing, if this isn't confusing enough 544 00:28:39,730 --> 00:28:40,980 for the naming. 545 00:28:40,980 --> 00:28:42,960 546 00:28:42,960 --> 00:28:45,540 Let's go on to how you name data types. 547 00:28:45,540 --> 00:28:47,530 And I think some of you have already experienced this a 548 00:28:47,530 --> 00:28:50,960 little bit, the beauties of the data types. 549 00:28:50,960 --> 00:28:56,410 So in C, they have all these different data types such as 550 00:28:56,410 --> 00:28:58,580 I'm listing here. 551 00:28:58,580 --> 00:29:02,100 And if you want to generate a constant of that size, so 552 00:29:02,100 --> 00:29:06,570 sometimes the compiler will coerce a value from one type 553 00:29:06,570 --> 00:29:07,330 to another. 554 00:29:07,330 --> 00:29:09,220 But sometimes it won't. 555 00:29:09,220 --> 00:29:11,520 And so if you want to have a constant, and I've just given 556 00:29:11,520 --> 00:29:14,410 a couple things here, for example, if you want it to be 557 00:29:14,410 --> 00:29:18,710 just an int, you can just write the number. 558 00:29:18,710 --> 00:29:20,350 But if you want it to be unsigned, you have to 559 00:29:20,350 --> 00:29:22,000 put a U after it. 560 00:29:22,000 --> 00:29:23,620 Or if you want it to be a long, you have to 561 00:29:23,620 --> 00:29:26,210 put an L after it. 562 00:29:26,210 --> 00:29:29,390 And for many things it'll get coerced automatically to the 563 00:29:29,390 --> 00:29:32,320 right type because if you do an operator with another 564 00:29:32,320 --> 00:29:35,100 argument it will be coerced to that type. 565 00:29:35,100 --> 00:29:37,490 But some of you got burned on some of the shift things to 566 00:29:37,490 --> 00:29:44,570 begin with because it wasn't clear what exactly the sizes. 567 00:29:44,570 --> 00:29:50,230 Well, you can be explicit in C and name them using this 568 00:29:50,230 --> 00:29:52,690 particular convention. 569 00:29:52,690 --> 00:29:57,100 This tells you how many bytes are being allocated for that 570 00:29:57,100 --> 00:30:03,110 type in the x86 64 size. 571 00:30:03,110 --> 00:30:04,550 So it's [? veted ?] here for four. 572 00:30:04,550 --> 00:30:06,150 Now, long double is a funny one. 573 00:30:06,150 --> 00:30:08,910 It's actually allocate 16 bytes, but only 574 00:30:08,910 --> 00:30:11,740 10 of them are used. 575 00:30:11,740 --> 00:30:15,500 So basically there are six bytes that get unused by that. 576 00:30:15,500 --> 00:30:19,960 And I think that's for future expansion so that they can 577 00:30:19,960 --> 00:30:21,700 have even wider extension. 578 00:30:21,700 --> 00:30:23,480 This is generally used, of course, for floating 579 00:30:23,480 --> 00:30:25,010 point and so forth. 580 00:30:25,010 --> 00:30:28,480 Now, in the assembly language, each of the 581 00:30:28,480 --> 00:30:34,125 operators has a suffix. 582 00:30:34,125 --> 00:30:37,060 And sometimes, if it's a two operand instruction, it may, 583 00:30:37,060 --> 00:30:39,820 where it's taking things of different sizes, it may have 584 00:30:39,820 --> 00:30:40,780 two suffixes. 585 00:30:40,780 --> 00:30:44,110 But it has a suffix which is a single character that tells 586 00:30:44,110 --> 00:30:52,050 you what the size is that you're working with. 587 00:30:52,050 --> 00:30:55,310 So, for example, B is for byte. 588 00:30:55,310 --> 00:30:57,980 W is for word because originally the 589 00:30:57,980 --> 00:31:01,421 words were 16 bits. 590 00:31:01,421 --> 00:31:05,580 L is for long except that it's not a long 591 00:31:05,580 --> 00:31:07,040 so don't get confused. 592 00:31:07,040 --> 00:31:08,680 L is not long. 593 00:31:08,680 --> 00:31:15,180 Long is a quad word, or Q, four bytes. 594 00:31:15,180 --> 00:31:19,230 And then a float is an S. A double is a D. And a long 595 00:31:19,230 --> 00:31:22,100 double is a T. 596 00:31:22,100 --> 00:31:24,920 So these you will get familiar with. 597 00:31:24,920 --> 00:31:26,160 And they're not so hard. 598 00:31:26,160 --> 00:31:28,290 But that doesn't mean you know them right off the bat. 599 00:31:28,290 --> 00:31:29,650 And it helps to have a cheat sheet. 600 00:31:29,650 --> 00:31:32,860 601 00:31:32,860 --> 00:31:36,550 As I say, the main one not to get confused about is the Ls. 602 00:31:36,550 --> 00:31:42,630 L means something different in x86 than it means in C. 603 00:31:42,630 --> 00:31:49,250 So, for example, here we have a move of, and because it's a 604 00:31:49,250 --> 00:31:54,540 Q, I know that it is an eight byte or a 64-bit operator. 605 00:31:54,540 --> 00:31:57,615 And you can tell that also because it's using RBP and 606 00:31:57,615 --> 00:32:02,630 RAX, both of which are 64-bit registers. 607 00:32:02,630 --> 00:32:05,330 In fact, in assembly, you can actually write it without the 608 00:32:05,330 --> 00:32:13,420 Q, because the assembler can infer when the Q isn't there 609 00:32:13,420 --> 00:32:18,060 that, oh, this is a 64-bit register, that's a 64-bit 610 00:32:18,060 --> 00:32:21,700 register, I bet he means move 64-bits. 611 00:32:21,700 --> 00:32:24,020 So it actually fills that in sometimes. 612 00:32:24,020 --> 00:32:27,220 But sometimes you need to be explicit. 613 00:32:27,220 --> 00:32:27,580 Question? 614 00:32:27,580 --> 00:32:30,418 AUDIENCE: What happens when you actually put 64-bit 615 00:32:30,418 --> 00:32:33,640 registers but you only, and you just put move [? b ?] 616 00:32:33,640 --> 00:32:34,020 or something? 617 00:32:34,020 --> 00:32:34,375 Would it complain? 618 00:32:34,375 --> 00:32:36,490 Would it [UNINTELLIGIBLE]? 619 00:32:36,490 --> 00:32:37,980 CHARLES LEISERSON: Yeah, it would complain. 620 00:32:37,980 --> 00:32:38,910 Yeah, it would complain. 621 00:32:38,910 --> 00:32:46,750 it'll say it's an improperly formed instruction, so, yeah. 622 00:32:46,750 --> 00:32:48,110 And the other thing you can do, of course, 623 00:32:48,110 --> 00:32:50,430 is just try it out. 624 00:32:50,430 --> 00:32:51,920 What happens if? 625 00:32:51,920 --> 00:32:53,200 That's the great thing about computers. 626 00:32:53,200 --> 00:32:56,140 It's easy to do what happened if. 627 00:32:56,140 --> 00:33:00,680 Now, the instruction format is typically an opcode followed 628 00:33:00,680 --> 00:33:03,840 by an operand list. 629 00:33:03,840 --> 00:33:06,760 So the opcode is a short mnemonic identifying the type 630 00:33:06,760 --> 00:33:12,230 of instruction that includes typically the single character 631 00:33:12,230 --> 00:33:15,390 suffix indicating the data type. 632 00:33:15,390 --> 00:33:18,570 However, for some instructions it turns out you can have two 633 00:33:18,570 --> 00:33:20,610 suffixes if the two-- 634 00:33:20,610 --> 00:33:25,670 Most instructions operate on data types of the same size. 635 00:33:25,670 --> 00:33:31,300 But some of them operate on two different sizes in which 636 00:33:31,300 --> 00:33:33,130 case you'll have two suffixes. 637 00:33:33,130 --> 00:33:35,480 If the suffix is missing, it can generally be inferred, as 638 00:33:35,480 --> 00:33:36,710 I mentioned. 639 00:33:36,710 --> 00:33:41,580 Then the operand list is from zero, two, and very rarely 640 00:33:41,580 --> 00:33:45,840 three operands separated by commas. 641 00:33:45,840 --> 00:33:50,090 Now, in the architecture manual, in fact, they say if 642 00:33:50,090 --> 00:33:54,790 you look at it, they'll show you fourth operand. 643 00:33:54,790 --> 00:33:56,260 And I said, four operands? 644 00:33:56,260 --> 00:33:58,270 This documentation says there's only three. 645 00:33:58,270 --> 00:34:00,460 This one says there's four. 646 00:34:00,460 --> 00:34:01,520 I went through the whole 647 00:34:01,520 --> 00:34:03,170 architecture manual last night. 648 00:34:03,170 --> 00:34:06,660 649 00:34:06,660 --> 00:34:10,210 Every time it says four operands, it says N/A, not 650 00:34:10,210 --> 00:34:11,500 applicable. 651 00:34:11,500 --> 00:34:14,540 So I think it's just there reserved or something. 652 00:34:14,540 --> 00:34:20,070 But anyway there is no fourth operand as far as I can tell. 653 00:34:20,070 --> 00:34:25,690 Now, one of the operands is the destination. 654 00:34:25,690 --> 00:34:28,489 And here's where we start to get into some differences. 655 00:34:28,489 --> 00:34:32,139 There's actually two standard formats for assembly language 656 00:34:32,139 --> 00:34:36,630 that are generally called Intel and AT&T. So AT&T was 657 00:34:36,630 --> 00:34:41,469 the original Unix system. 658 00:34:41,469 --> 00:34:45,969 And Intel is what Intel uses for their assembler. 659 00:34:45,969 --> 00:34:51,389 They do the destination operand in the opposite order. 660 00:34:51,389 --> 00:34:55,000 So AT&T, it puts the destination last. 661 00:34:55,000 --> 00:34:57,910 In Intel it puts the destination first. 662 00:34:57,910 --> 00:35:00,720 So when you're reading documentation, you can read 663 00:35:00,720 --> 00:35:01,595 the Intel documentation. 664 00:35:01,595 --> 00:35:04,020 You just have to remember to flip it around if you're 665 00:35:04,020 --> 00:35:07,260 actually writing it as we will be using the AT&T format. 666 00:35:07,260 --> 00:35:09,860 Almost everybody uses AT&T as far as I 667 00:35:09,860 --> 00:35:13,370 can tell except Intel. 668 00:35:13,370 --> 00:35:16,740 So Intel's assembler does it the other way around. 669 00:35:16,740 --> 00:35:20,030 And actually now GCC will actually, you can give it a 670 00:35:20,030 --> 00:35:28,190 directive to say I'm now switching to writing it in 671 00:35:28,190 --> 00:35:29,480 Intel assembler. 672 00:35:29,480 --> 00:35:31,450 So you can actually go back and forth between the two if 673 00:35:31,450 --> 00:35:34,080 you happen to borrow some assembly language code from 674 00:35:34,080 --> 00:35:35,510 somebody else. 675 00:35:35,510 --> 00:35:37,620 So one of them is the destination. 676 00:35:37,620 --> 00:35:42,680 The other operations are read-only, so const in the C++ 677 00:35:42,680 --> 00:35:44,230 terminology. 678 00:35:44,230 --> 00:35:45,540 They're read-only. 679 00:35:45,540 --> 00:35:49,140 So it's always the case that only one of them is going to 680 00:35:49,140 --> 00:35:50,160 be modified. 681 00:35:50,160 --> 00:35:54,770 And that's the one that's the destination of the operation. 682 00:35:54,770 --> 00:35:56,730 In addition in assembler, there are what are called 683 00:35:56,730 --> 00:35:57,570 directives. 684 00:35:57,570 --> 00:36:01,840 Besides the instructions, there are directives. 685 00:36:01,840 --> 00:36:03,740 So first of all there are things like labels. 686 00:36:03,740 --> 00:36:07,780 You can take any instruction and put an identifier and a 687 00:36:07,780 --> 00:36:12,200 colon, and that becomes then a way of naming that 688 00:36:12,200 --> 00:36:14,040 place in your code. 689 00:36:14,040 --> 00:36:16,320 So, for example, jump instructions want to know to 690 00:36:16,320 --> 00:36:18,480 where they're jumping. 691 00:36:18,480 --> 00:36:23,050 And rather than having to know upfront what the address is, 692 00:36:23,050 --> 00:36:25,740 the assembler will calculate what that address is and 693 00:36:25,740 --> 00:36:29,420 everywhere you put x, it'll put in the right value. 694 00:36:29,420 --> 00:36:33,130 And you get to name it symbolically rather than as an 695 00:36:33,130 --> 00:36:35,700 absolute machine location. 696 00:36:35,700 --> 00:36:37,180 There are storage directives. 697 00:36:37,180 --> 00:36:41,560 So, for example, .space 20 says allocate 20 bytes at 698 00:36:41,560 --> 00:36:43,270 location x. 699 00:36:43,270 --> 00:36:49,890 .long says store the constant 172 at y. 700 00:36:49,890 --> 00:36:53,620 It's being stored at y because I said y is here. 701 00:36:53,620 --> 00:36:57,100 And asciz gives you a string that's zero terminated. 702 00:36:57,100 --> 00:37:00,720 So the standard for strings is zero terminated. 703 00:37:00,720 --> 00:37:03,390 You can also, there's one that says give me a 704 00:37:03,390 --> 00:37:05,630 nonterminated string. 705 00:37:05,630 --> 00:37:08,740 So you can have fun with that if you like that. 706 00:37:08,740 --> 00:37:12,150 The align directive says make sure that as you're going 707 00:37:12,150 --> 00:37:14,040 through, so what's happening is the assembler is going 708 00:37:14,040 --> 00:37:16,640 through there, is it's laying these things out in memory 709 00:37:16,640 --> 00:37:19,300 typically sequentially, the way you wrote it down in the 710 00:37:19,300 --> 00:37:23,460 program, in the assembly language program. 711 00:37:23,460 --> 00:37:26,460 If you say align eight, it says advance whatever the 712 00:37:26,460 --> 00:37:29,480 pointer is of where the next thing is going to be put to be 713 00:37:29,480 --> 00:37:31,910 a multiple of eight. 714 00:37:31,910 --> 00:37:34,600 And that way you don't run the risk of where you declare a 715 00:37:34,600 --> 00:37:39,550 character and then you say, OK, and now I want a long or 716 00:37:39,550 --> 00:37:43,930 something, and it's not aligned in a way that the 717 00:37:43,930 --> 00:37:48,700 eight bytes correspond to a multiple of eight the way you 718 00:37:48,700 --> 00:37:50,300 need to in order for the instructions to 719 00:37:50,300 --> 00:37:52,890 properly work on them. 720 00:37:52,890 --> 00:37:56,350 So generally, although, we have byte pointers, most 721 00:37:56,350 --> 00:37:58,445 instructions only work on aligned values. 722 00:37:58,445 --> 00:38:01,760 723 00:38:01,760 --> 00:38:05,050 And for some of them that work on unaligned values, they're 724 00:38:05,050 --> 00:38:09,760 generally slower than the ones that work on aligned values. 725 00:38:09,760 --> 00:38:11,260 There are also segment directives. 726 00:38:11,260 --> 00:38:16,510 So in memory when you run your program, the executing program 727 00:38:16,510 --> 00:38:20,810 starts with the program text down at the bottom of memory. 728 00:38:20,810 --> 00:38:25,830 And then it has fixed data that's not going to change, 729 00:38:25,830 --> 00:38:27,410 static allocation of data. 730 00:38:27,410 --> 00:38:30,870 And then it's got heap, which is dynamically allocated data. 731 00:38:30,870 --> 00:38:32,010 And then it's got stack. 732 00:38:32,010 --> 00:38:36,230 The stack grows downward, and the heap grows upward. 733 00:38:36,230 --> 00:38:39,530 By saying something like text, it says make sure that the 734 00:38:39,530 --> 00:38:42,010 next stuff I'm putting goes into the text segment. 735 00:38:42,010 --> 00:38:45,430 So that's generally where you put your code. 736 00:38:45,430 --> 00:38:47,590 Saying it's in data says make sure it goes in here. 737 00:38:47,590 --> 00:38:51,500 So you may want to have a table, for example. 738 00:38:51,500 --> 00:38:54,230 So, for example, from pentominos you might have a 739 00:38:54,230 --> 00:38:55,470 table there. 740 00:38:55,470 --> 00:38:56,610 There's going to be a fixed table. 741 00:38:56,610 --> 00:38:57,740 You're never going to change it during the 742 00:38:57,740 --> 00:38:59,050 running of the program. 743 00:38:59,050 --> 00:39:00,300 Put it in the data segment. 744 00:39:00,300 --> 00:39:03,790 745 00:39:03,790 --> 00:39:06,060 And then there's also things like scope and linkage 746 00:39:06,060 --> 00:39:06,550 directives. 747 00:39:06,550 --> 00:39:09,640 So saying .global, and you can either spell it incorrectly, 748 00:39:09,640 --> 00:39:14,310 as I have here, or with the a, it's the same thing for the 749 00:39:14,310 --> 00:39:16,640 GNU assembler anyway. 750 00:39:16,640 --> 00:39:21,130 It says make the symbol fib externally visible. 751 00:39:21,130 --> 00:39:23,760 And that makes sure that it goes into the symbol table so 752 00:39:23,760 --> 00:39:26,910 that debuggers and things can look at it. 753 00:39:26,910 --> 00:39:29,980 And there's a lot more of these in the assembler manual, 754 00:39:29,980 --> 00:39:33,140 that link that I showed you to before, tells you what all the 755 00:39:33,140 --> 00:39:33,940 directives mean. 756 00:39:33,940 --> 00:39:35,880 So that when you're looking at code, which mostly you'll be 757 00:39:35,880 --> 00:39:38,680 reading it, making a few changes to it, you can know 758 00:39:38,680 --> 00:39:39,930 what things mean. 759 00:39:39,930 --> 00:39:42,520 760 00:39:42,520 --> 00:39:47,020 So the opcode examples, here's some examples. 761 00:39:47,020 --> 00:39:49,500 There are things like mov, push, and pop. 762 00:39:49,500 --> 00:39:53,970 So, for example, here movslq, this is an interesting one 763 00:39:53,970 --> 00:39:55,150 because it's moving. 764 00:39:55,150 --> 00:40:01,000 The s says extend the sign because I'm moving from a long 765 00:40:01,000 --> 00:40:09,930 from a 32-bit word to a 64-bit word, from 4-bits to 8-bits. 766 00:40:09,930 --> 00:40:14,490 So that's why this one takes two suffixes moving from, 767 00:40:14,490 --> 00:40:17,470 you'll notice, a 32-bit register to a 64-bit register. 768 00:40:17,470 --> 00:40:20,160 769 00:40:20,160 --> 00:40:21,570 So you have to be careful. 770 00:40:21,570 --> 00:40:24,840 This is something I got caught up in the other day. 771 00:40:24,840 --> 00:40:27,960 The results of 32-bit operations are implicitly 772 00:40:27,960 --> 00:40:30,620 extended to 64-bit values. 773 00:40:30,620 --> 00:40:34,330 So if you store something into EAX, for example, it 774 00:40:34,330 --> 00:40:40,220 automatically zeroes out the high order 32-bits of RAX. 775 00:40:40,220 --> 00:40:42,130 Because that's the one that it's embedded in. 776 00:40:42,130 --> 00:40:46,340 However, that's not true for the eight and 16-bit 777 00:40:46,340 --> 00:40:47,580 operations. 778 00:40:47,580 --> 00:40:52,910 If you store into an 8-bit field, an 8-bit part of the 779 00:40:52,910 --> 00:40:55,750 register, it does not zero out the high 780 00:40:55,750 --> 00:40:58,120 order bits of the remainder. 781 00:40:58,120 --> 00:41:02,340 So you just have to be careful when you're doing that. 782 00:41:02,340 --> 00:41:06,370 Most of these are things, by the way, that is more cryptic 783 00:41:06,370 --> 00:41:07,250 when you're looking at stuff. 784 00:41:07,250 --> 00:41:10,850 It's like, oh, how come it's, gee, I thought I had, I'm 785 00:41:10,850 --> 00:41:13,770 returning a double word, but it looks here like it's 786 00:41:13,770 --> 00:41:16,440 returning a 32-bit word, how come I thought I 787 00:41:16,440 --> 00:41:18,270 was returning 64-- 788 00:41:18,270 --> 00:41:20,060 Well, the answer is because it knows the high 789 00:41:20,060 --> 00:41:21,310 order bits are zero. 790 00:41:21,310 --> 00:41:23,343 So it's using the shorter instructions. 791 00:41:23,343 --> 00:41:26,230 792 00:41:26,230 --> 00:41:33,780 And yet it still is having the impact on the 64-bit register. 793 00:41:33,780 --> 00:41:37,000 They're all the arithmetic and logical operations. 794 00:41:37,000 --> 00:41:40,810 So subtracting, once again, the destination is second. 795 00:41:40,810 --> 00:41:43,170 So typically these are two operator things. 796 00:41:43,170 --> 00:41:45,820 So you always have the destination occurs both at the 797 00:41:45,820 --> 00:41:49,130 beginning and on the left hand side and the right hand side, 798 00:41:49,130 --> 00:41:53,027 things like shifts and rotates, control transfer, so 799 00:41:53,027 --> 00:41:56,190 call which does a subroutine jump, return from a 800 00:41:56,190 --> 00:42:00,650 subroutine, a jump instruction that just says make the next 801 00:42:00,650 --> 00:42:03,320 instruction the thing that you're pointing to, and very 802 00:42:03,320 --> 00:42:05,380 important, the jump conditionals where the 803 00:42:05,380 --> 00:42:08,460 condition is a whole bunch of keys that are things like 804 00:42:08,460 --> 00:42:12,420 greater than, less than, and so forth, and different ones 805 00:42:12,420 --> 00:42:16,400 for signed and unsigned and so forth. 806 00:42:16,400 --> 00:42:20,210 So typically the condition is computed by using a compare 807 00:42:20,210 --> 00:42:20,800 instruction. 808 00:42:20,800 --> 00:42:24,570 I probably should have put CNP on here as well, but I didn't. 809 00:42:24,570 --> 00:42:27,540 But the CNP instruction is usually what you use to 810 00:42:27,540 --> 00:42:30,900 compare two things and then you separately jump on what 811 00:42:30,900 --> 00:42:32,960 the condition is. 812 00:42:32,960 --> 00:42:37,330 There's a pretty nice website that has 813 00:42:37,330 --> 00:42:39,840 most of these opcodes. 814 00:42:39,840 --> 00:42:44,590 However, they only deal with the old x86 815 00:42:44,590 --> 00:42:46,970 without the 64-bit extension. 816 00:42:46,970 --> 00:42:49,390 And they use the Intel syntax. 817 00:42:49,390 --> 00:42:50,570 But it's really convenient. 818 00:42:50,570 --> 00:42:55,140 Because they've done a nice job of making a quick jump 819 00:42:55,140 --> 00:42:57,220 table where you can just go, look up the 820 00:42:57,220 --> 00:42:59,200 opcode, and pop it up. 821 00:42:59,200 --> 00:43:02,560 Otherwise you can just look at them in the manual. 822 00:43:02,560 --> 00:43:04,420 Anyway, that's kind of a convenient place. 823 00:43:04,420 --> 00:43:07,180 And, as I say, just beware because it's 32-bit only, and 824 00:43:07,180 --> 00:43:08,350 it's Intel syntax. 825 00:43:08,350 --> 00:43:10,190 Most of the instructions got extended. 826 00:43:10,190 --> 00:43:15,240 I mean it's like, OK, if you do it for eight and 16 and 32, 827 00:43:15,240 --> 00:43:18,720 the operation is not going to change that much to go to 64. 828 00:43:18,720 --> 00:43:20,780 A few of them do, however. 829 00:43:20,780 --> 00:43:27,520 Now, the operands, Intel supports, the x86, which is 830 00:43:27,520 --> 00:43:30,820 Intel and AMD, typically, support all kinds of 831 00:43:30,820 --> 00:43:33,870 addressing modes. 832 00:43:33,870 --> 00:43:36,550 The rule is that only one operand, 833 00:43:36,550 --> 00:43:41,060 however, can address memory. 834 00:43:41,060 --> 00:43:43,180 So you have to pick which is the operand that's going to 835 00:43:43,180 --> 00:43:45,600 address memory if you have multiple operands. 836 00:43:45,600 --> 00:43:47,270 You can't have both operands. 837 00:43:47,270 --> 00:43:50,860 So you can't add two things in memory. 838 00:43:50,860 --> 00:43:52,800 You always have to take something for memory and bring 839 00:43:52,800 --> 00:43:56,880 it into a register and then store it back out. 840 00:43:56,880 --> 00:44:00,610 So the simplest one is two register instructions. 841 00:44:00,610 --> 00:44:04,080 Here I've basically marked the-- 842 00:44:04,080 --> 00:44:05,030 What have I marked here? 843 00:44:05,030 --> 00:44:07,515 I guess I marked-- 844 00:44:07,515 --> 00:44:07,820 I don't know. 845 00:44:07,820 --> 00:44:09,090 Down here I was marking memory. 846 00:44:09,090 --> 00:44:10,820 I'm not sure what I was marking up here. 847 00:44:10,820 --> 00:44:12,070 Because they're both registers. 848 00:44:12,070 --> 00:44:14,270 849 00:44:14,270 --> 00:44:18,710 But in any case, this is just adding RBX into RAX. 850 00:44:18,710 --> 00:44:21,850 And so it takes the contents of RBX adds it in 851 00:44:21,850 --> 00:44:23,730 the contents of RAX. 852 00:44:23,730 --> 00:44:26,360 There's something that's called direct. 853 00:44:26,360 --> 00:44:31,780 So this is, it says, where you move, x is some constant 854 00:44:31,780 --> 00:44:37,130 value, and you move it, the contents of it, into RDI. 855 00:44:37,130 --> 00:44:39,270 So if x, for example, is a location that you've stored a 856 00:44:39,270 --> 00:44:42,570 value in, you can say move whatever is the value at that 857 00:44:42,570 --> 00:44:46,450 location into RDI. 858 00:44:46,450 --> 00:44:51,240 Immediate says, which usually is preceded by a dollar sign, 859 00:44:51,240 --> 00:44:54,990 says move the address of it, move that as a constant. 860 00:44:54,990 --> 00:44:57,970 So x has a value, move that value. 861 00:44:57,970 --> 00:45:02,290 So if you say $3, then you'll move the 862 00:45:02,290 --> 00:45:03,740 constant three into RDI. 863 00:45:03,740 --> 00:45:07,690 If you said mov3 this, you're going to move the contents of 864 00:45:07,690 --> 00:45:11,030 location three in memory. 865 00:45:11,030 --> 00:45:16,250 So that's the difference between direct and immediate. 866 00:45:16,250 --> 00:45:19,720 So the dollar sign says you're taking that 867 00:45:19,720 --> 00:45:21,350 as a literal constant. 868 00:45:21,350 --> 00:45:24,610 And the direct says you're actually going to memory and 869 00:45:24,610 --> 00:45:26,470 fetching it. 870 00:45:26,470 --> 00:45:28,020 Then things start getting interesting. 871 00:45:28,020 --> 00:45:33,215 Register indirect says, in this case, the thing that 872 00:45:33,215 --> 00:45:34,900 you're going to access is the thing 873 00:45:34,900 --> 00:45:38,050 pointed to by that register. 874 00:45:38,050 --> 00:45:42,610 So don't move, in this case, RBX into RAX, move it to the 875 00:45:42,610 --> 00:45:47,920 memory location that RAX is pointing to. 876 00:45:47,920 --> 00:45:51,160 Then you can do register index which says, well, it's 877 00:45:51,160 --> 00:45:57,300 pointing to it, but I want displaced 172-bytes off of 878 00:45:57,300 --> 00:46:01,930 that location, of whatever this is pointing to. 879 00:46:01,930 --> 00:46:04,470 So, for example, if you have a pointer to a record, you can 880 00:46:04,470 --> 00:46:07,050 then have just a single pointer and address all the 881 00:46:07,050 --> 00:46:10,320 fields just by doing register indirect to the different 882 00:46:10,320 --> 00:46:11,940 fields using that same register. 883 00:46:11,940 --> 00:46:15,430 884 00:46:15,430 --> 00:46:17,650 Then there is, it actually-- 885 00:46:17,650 --> 00:46:21,220 I skipped, actually, a few in here that are subsets of this. 886 00:46:21,220 --> 00:46:24,500 This is, I think, the most complicated one that I know. 887 00:46:24,500 --> 00:46:27,520 It's base index scale displacement where base and 888 00:46:27,520 --> 00:46:31,100 index are registers, the scale is two, four, eight, and if 889 00:46:31,100 --> 00:46:33,050 it's not there, it implies one. 890 00:46:33,050 --> 00:46:37,710 The displacement is eight, 16, or a 32-bit value. 891 00:46:37,710 --> 00:46:45,260 And it says take RD-- 892 00:46:45,260 --> 00:46:48,570 oh, I had put the math on here, and then I guess I lost 893 00:46:48,570 --> 00:46:54,640 it-- it says take RDX, multiply it by eight, add RDI, 894 00:46:54,640 --> 00:46:57,141 and add 172. 895 00:46:57,141 --> 00:46:58,391 [WHISTLE]. 896 00:46:58,391 --> 00:47:00,290 897 00:47:00,290 --> 00:47:02,700 So, anyway, you can look in the manual. 898 00:47:02,700 --> 00:47:05,860 So you'll see some of these instructions being generated. 899 00:47:05,860 --> 00:47:08,030 Generally, you're not going to generate these instructions. 900 00:47:08,030 --> 00:47:10,780 So when you see them generated, you can see it. 901 00:47:10,780 --> 00:47:14,220 And then, and this is actually new, it's not in the x86. 902 00:47:14,220 --> 00:47:18,270 It has this instruction pointer where you can actually 903 00:47:18,270 --> 00:47:22,350 access where the current program counter is pointing, 904 00:47:22,350 --> 00:47:25,490 where it is in the code, and store that value, in this case 905 00:47:25,490 --> 00:47:28,660 indexed by six, into RAX. 906 00:47:28,660 --> 00:47:30,760 So you can do it relative where this has 907 00:47:30,760 --> 00:47:33,200 to be a 32-bit constant. 908 00:47:33,200 --> 00:47:37,210 And what's good about that is it allows you then to write 909 00:47:37,210 --> 00:47:39,930 code where you can do things like jump to something that's 910 00:47:39,930 --> 00:47:42,690 relative to the program counter. 911 00:47:42,690 --> 00:47:46,020 And that lets you put the code anywhere in memory, and it 912 00:47:46,020 --> 00:47:48,010 still has the same behavior. 913 00:47:48,010 --> 00:47:51,190 Because you're going relative to where that code is rather 914 00:47:51,190 --> 00:47:54,080 than to an absolute location. 915 00:47:54,080 --> 00:47:56,315 So it allows the code to be relocatable. 916 00:47:56,315 --> 00:48:00,580 917 00:48:00,580 --> 00:48:02,540 So here's-- 918 00:48:02,540 --> 00:48:03,740 Yeah, questions, yeah sure. 919 00:48:03,740 --> 00:48:07,044 AUDIENCE: Why was it the index registers, when you have the 920 00:48:07,044 --> 00:48:09,876 numbering for the register, what does that mean again? 921 00:48:09,876 --> 00:48:11,680 CHARLES LEISERSON: The number before the register? 922 00:48:11,680 --> 00:48:13,099 AUDIENCE: In the instruction [UNINTELLIGIBLE], 923 00:48:13,099 --> 00:48:15,120 that's 60 for RID. 924 00:48:15,120 --> 00:48:18,610 CHARLES LEISERSON: OK, or whatever, whenever it's here, 925 00:48:18,610 --> 00:48:22,440 it's basically saying, add that value to 926 00:48:22,440 --> 00:48:25,900 the contents of RAX. 927 00:48:25,900 --> 00:48:29,490 And so the same thing here, add six to the contents of the 928 00:48:29,490 --> 00:48:30,050 instruction. 929 00:48:30,050 --> 00:48:35,510 So this is six bytes ahead of me in the instruction stream. 930 00:48:35,510 --> 00:48:36,360 OK? 931 00:48:36,360 --> 00:48:39,230 So you can actually say, well, what's that instruction ahead 932 00:48:39,230 --> 00:48:41,380 of me in the instructions stream? 933 00:48:41,380 --> 00:48:42,630 OK? 934 00:48:42,630 --> 00:48:45,350 935 00:48:45,350 --> 00:48:50,490 So here's some examples of essentially the same code and 936 00:48:50,490 --> 00:48:53,470 how it gets compiled. 937 00:48:53,470 --> 00:48:57,420 So here we're going to have a fou1, fou2, fou3. 938 00:48:57,420 --> 00:49:00,690 And in this case we declare x, y, and z 939 00:49:00,690 --> 00:49:02,910 to be unsigned integers. 940 00:49:02,910 --> 00:49:04,450 We set them to some values. 941 00:49:04,450 --> 00:49:10,900 And we just simply say return x plus y or z, 942 00:49:10,900 --> 00:49:12,880 bitwise OR with z. 943 00:49:12,880 --> 00:49:16,450 If you look at what the code is that's generated, it says 944 00:49:16,450 --> 00:49:19,612 move the constant 45 into EAX. 945 00:49:19,612 --> 00:49:22,740 946 00:49:22,740 --> 00:49:23,800 Why does it do that? 947 00:49:23,800 --> 00:49:24,990 Well, let's just see. 948 00:49:24,990 --> 00:49:28,930 Well, the compiler figures out that it knows what 35, seven, 949 00:49:28,930 --> 00:49:29,890 and 45 are. 950 00:49:29,890 --> 00:49:31,560 It computes x plus y. 951 00:49:31,560 --> 00:49:32,975 That's 41. 952 00:49:32,975 --> 00:49:37,930 If you take 41 bitwise OR with 45, it turns out it's masking 953 00:49:37,930 --> 00:49:40,040 the same bits, that's 45. 954 00:49:40,040 --> 00:49:43,600 So the compiler actually can figure this out that all it 955 00:49:43,600 --> 00:49:48,040 has to do is return 45 in a 64-bit register. 956 00:49:48,040 --> 00:49:51,620 Ah, but here it's returning it in a 32-bit register. 957 00:49:51,620 --> 00:49:52,770 What happened? 958 00:49:52,770 --> 00:49:53,990 It's not obeying the type. 959 00:49:53,990 --> 00:49:56,200 The type is supposed to be 64-bits, but 960 00:49:56,200 --> 00:49:58,460 that's a 32-bit register. 961 00:49:58,460 --> 00:50:01,400 Oh, yeah, that's this thing where it automatically zeroes 962 00:50:01,400 --> 00:50:03,540 out the high order bits. 963 00:50:03,540 --> 00:50:07,200 And it uses this instruction, because this is a shorter 964 00:50:07,200 --> 00:50:10,030 instruction than if it did the RAX. 965 00:50:10,030 --> 00:50:12,550 It could do the same thing with RAX, but it would be more 966 00:50:12,550 --> 00:50:13,640 bytes of instruction. 967 00:50:13,640 --> 00:50:17,020 So they saved a couple bytes of instruction by doing that. 968 00:50:17,020 --> 00:50:20,240 So people follow what happened there? 969 00:50:20,240 --> 00:50:21,290 Let's take a look at the next one. 970 00:50:21,290 --> 00:50:23,770 Here it's the same code just let's pass 971 00:50:23,770 --> 00:50:26,890 those things as arguments. 972 00:50:26,890 --> 00:50:29,280 Well, if you remember the calling convention, parameter 973 00:50:29,280 --> 00:50:32,640 one is in RDI, parameter two is in RSI, and 974 00:50:32,640 --> 00:50:35,580 parameter three is in RDX. 975 00:50:35,580 --> 00:50:37,170 So I don't expect you to remember that off 976 00:50:37,170 --> 00:50:37,920 the top your head. 977 00:50:37,920 --> 00:50:41,020 But we have the cheat sheet, and you can figure that out. 978 00:50:41,020 --> 00:50:43,520 So here's what it does is, oh my goodness, what is that 979 00:50:43,520 --> 00:50:44,530 instruction? 980 00:50:44,530 --> 00:50:51,070 This is actually a computation of effective address. 981 00:50:51,070 --> 00:50:54,100 So the effective address is basically saying, and it's 982 00:50:54,100 --> 00:50:56,610 using one of these funny indexing modes. 983 00:50:56,610 --> 00:51:00,650 So what this is actually doing is it's actually adding these 984 00:51:00,650 --> 00:51:05,690 two numbers together, the values stored in those 985 00:51:05,690 --> 00:51:08,940 locations together, and storing it into RAX. 986 00:51:08,940 --> 00:51:16,990 And then it's then OR-ing RDX, what's in RDX, with RAX and 987 00:51:16,990 --> 00:51:17,800 then returning. 988 00:51:17,800 --> 00:51:22,280 Remember that RAX is where the result is always going to be. 989 00:51:22,280 --> 00:51:24,470 So the result is always returned in RAX. 990 00:51:24,470 --> 00:51:27,360 So you can see you have to do a little bit more complicated 991 00:51:27,360 --> 00:51:30,240 addressing in order to pull them out as parameters than if 992 00:51:30,240 --> 00:51:32,690 it could actually figure out what the numbers are. 993 00:51:32,690 --> 00:51:35,890 Last example here is I declared these things before I 994 00:51:35,890 --> 00:51:38,760 ever got their globals. 995 00:51:38,760 --> 00:51:40,540 And so I declared them before I ever got in. 996 00:51:40,540 --> 00:51:42,530 So that means since they're globals, they have a fixed 997 00:51:42,530 --> 00:51:44,200 place in memory. 998 00:51:44,200 --> 00:51:49,180 And so the code that's generated is moving, it turns 999 00:51:49,180 --> 00:51:53,790 out it allocates them right nearby the instructions here. 1000 00:51:53,790 --> 00:51:56,710 And so what it does is it has actually a relative offset for 1001 00:51:56,710 --> 00:52:03,490 x, relativity instruction pointer, put that in RAX, add 1002 00:52:03,490 --> 00:52:08,420 the offset of x into it, and then OR it with the offset of 1003 00:52:08,420 --> 00:52:10,990 z, and then return. 1004 00:52:10,990 --> 00:52:13,960 And so there the constants are actually stored right nearby 1005 00:52:13,960 --> 00:52:19,200 in the code so that they can use this relative offset. 1006 00:52:19,200 --> 00:52:23,470 And the compiler figures out, or the assembler figures out, 1007 00:52:23,470 --> 00:52:26,640 exactly what the offset is that it actually needs to 1008 00:52:26,640 --> 00:52:30,410 substitute for y so that it can be a relative offset from 1009 00:52:30,410 --> 00:52:32,120 the current instruction pointer. 1010 00:52:32,120 --> 00:52:33,970 Notice that, for example, that's going to change 1011 00:52:33,970 --> 00:52:35,980 depending upon the value of y here. 1012 00:52:35,980 --> 00:52:39,700 It's going to change compared to if I accessed y down here. 1013 00:52:39,700 --> 00:52:40,960 It would be a different instruction 1014 00:52:40,960 --> 00:52:43,070 pointer at this point. 1015 00:52:43,070 --> 00:52:45,020 So it actually just goes but it computes what the 1016 00:52:45,020 --> 00:52:50,710 difference is so it knows what the distance is. 1017 00:52:50,710 --> 00:52:53,730 It can compute that at compile time, and then at execution 1018 00:52:53,730 --> 00:52:57,440 time it just uses whatever constant goes in there. 1019 00:52:57,440 --> 00:52:59,540 So the important thing here is just to notice that the code 1020 00:52:59,540 --> 00:53:01,990 depends upon where x, y, and z are allocated. 1021 00:53:01,990 --> 00:53:06,010 1022 00:53:06,010 --> 00:53:11,420 So the first thing to actually look at good code is to 1023 00:53:11,420 --> 00:53:13,400 understand the calling convention 1024 00:53:13,400 --> 00:53:15,915 that's used by the compiler. 1025 00:53:15,915 --> 00:53:19,950 1026 00:53:19,950 --> 00:53:21,550 And here are the basics of it. 1027 00:53:21,550 --> 00:53:24,910 So the register RSP points to the function 1028 00:53:24,910 --> 00:53:26,620 call stack in memory. 1029 00:53:26,620 --> 00:53:29,750 And the call stack grows downward in memory, like in 1030 00:53:29,750 --> 00:53:33,080 that little map I showed you before, so that as you push 1031 00:53:33,080 --> 00:53:36,040 things onto the stack they're getting lower numbered not 1032 00:53:36,040 --> 00:53:37,290 higher numbered. 1033 00:53:37,290 --> 00:53:40,260 1034 00:53:40,260 --> 00:53:43,120 The call instruction pushes the current instruction 1035 00:53:43,120 --> 00:53:47,780 pointer onto the stack, jumps to the call target operand, 1036 00:53:47,780 --> 00:53:51,550 which is basically the address of the thing you're calling. 1037 00:53:51,550 --> 00:53:55,020 So when you do a call, it saves your return 1038 00:53:55,020 --> 00:53:57,580 address on the stack. 1039 00:53:57,580 --> 00:54:00,520 The return instruction pops the return address off the 1040 00:54:00,520 --> 00:54:02,410 stack and returns to the caller. 1041 00:54:02,410 --> 00:54:05,520 It basically says, oh, I know where the return address is. 1042 00:54:05,520 --> 00:54:09,790 I slam that into the current instruction pointer, and that 1043 00:54:09,790 --> 00:54:13,140 becomes the next instruction that's executed. 1044 00:54:13,140 --> 00:54:16,080 Now, there are some software conventions that are used 1045 00:54:16,080 --> 00:54:17,210 that's helpful to know. 1046 00:54:17,210 --> 00:54:20,030 Besides those instruction registers, some of the 1047 00:54:20,030 --> 00:54:23,300 registers are expected to be saved by the caller, some are 1048 00:54:23,300 --> 00:54:26,930 expected to be saved by the callee. 1049 00:54:26,930 --> 00:54:31,460 You're free to violate this in your own little piece of code 1050 00:54:31,460 --> 00:54:34,690 as long as if you're calling something else, 1051 00:54:34,690 --> 00:54:37,260 you're obeying it. 1052 00:54:37,260 --> 00:54:40,090 So you don't have obey this convention in the code you 1053 00:54:40,090 --> 00:54:44,760 write unless you want to interoperate with other stuff. 1054 00:54:44,760 --> 00:54:48,320 So if you, for example, have a leaf procedure, you can decide 1055 00:54:48,320 --> 00:54:50,880 for that leaf procedure, oh, I'm going to make something 1056 00:54:50,880 --> 00:54:54,240 callee saved that was caller saved or whatever as long as 1057 00:54:54,240 --> 00:54:56,840 by the time you return you've cleaned everything up for the 1058 00:54:56,840 --> 00:54:58,450 rest of the world. 1059 00:54:58,450 --> 00:55:00,800 So these are conventions. 1060 00:55:00,800 --> 00:55:03,575 But for the most part, you're not going to violate these, 1061 00:55:03,575 --> 00:55:06,350 and the code that the compiler generates doesn't violate 1062 00:55:06,350 --> 00:55:09,460 these because it expects everything to interoperate. 1063 00:55:09,460 --> 00:55:11,980 So here's how the subroutine linkage works. 1064 00:55:11,980 --> 00:55:15,100 We're going to do an example here where function A calls 1065 00:55:15,100 --> 00:55:18,200 function B which will call function C. And right now, 1066 00:55:18,200 --> 00:55:21,450 we're at the point we're executing B. And so on the 1067 00:55:21,450 --> 00:55:29,600 stack are the arguments that were passed from A to B that 1068 00:55:29,600 --> 00:55:32,020 did not fit within the registers. 1069 00:55:32,020 --> 00:55:33,050 So normally most of the 1070 00:55:33,050 --> 00:55:34,510 arguments are within registers. 1071 00:55:34,510 --> 00:55:38,750 But if you exceed the six registers then, because you 1072 00:55:38,750 --> 00:55:40,430 have a long argument list, then it gets 1073 00:55:40,430 --> 00:55:41,420 passed on the stack. 1074 00:55:41,420 --> 00:55:43,020 And here's where it gets passed. 1075 00:55:43,020 --> 00:55:45,890 The next thing is B's return address. 1076 00:55:45,890 --> 00:55:48,200 This is the thing that got smashed in there 1077 00:55:48,200 --> 00:55:49,890 when you did the call. 1078 00:55:49,890 --> 00:55:51,620 It got pushed onto the stack. 1079 00:55:51,620 --> 00:55:54,290 And then there's what's called a base pointer for A. And this 1080 00:55:54,290 --> 00:55:57,790 is the way that A ends up accessing its local variables. 1081 00:55:57,790 --> 00:56:01,110 And then there's a separate region here where it's going 1082 00:56:01,110 --> 00:56:06,920 to put arguments from B to B's callees if they exceed the six 1083 00:56:06,920 --> 00:56:13,440 registers, if any of the things that B is calling 1084 00:56:13,440 --> 00:56:15,130 require more than the six registers. 1085 00:56:15,130 --> 00:56:16,850 So let's just take a look. 1086 00:56:16,850 --> 00:56:22,320 So function B can access its nonregister values by indexing 1087 00:56:22,320 --> 00:56:25,600 off of RBP. 1088 00:56:25,600 --> 00:56:28,480 So these we say, these are in a linkage block. 1089 00:56:28,480 --> 00:56:30,250 And the reason is because it's actually part of 1090 00:56:30,250 --> 00:56:31,850 A's frame as well. 1091 00:56:31,850 --> 00:56:35,380 It's a shared part of the frame where A stores it into 1092 00:56:35,380 --> 00:56:38,010 memory and then B's going to fetch it out of memory. 1093 00:56:38,010 --> 00:56:39,870 And that's the linkage block. 1094 00:56:39,870 --> 00:56:41,900 So this is positive in memory. 1095 00:56:41,900 --> 00:56:47,500 So if I use a positive offset, I then go up to getting the 1096 00:56:47,500 --> 00:56:55,820 arguments from A. Then it can access its local variables 1097 00:56:55,820 --> 00:56:58,510 from the base point with a negative offset because we're 1098 00:56:58,510 --> 00:56:59,760 growing down in memory. 1099 00:56:59,760 --> 00:57:02,790 1100 00:57:02,790 --> 00:57:08,640 Now, if it wants to call C, what it does is it places the 1101 00:57:08,640 --> 00:57:15,680 nonregister arguments into the reserved linkage block here, 1102 00:57:15,680 --> 00:57:18,730 which are arguments from B to B's callees. 1103 00:57:18,730 --> 00:57:21,890 And that once again acts just as if they're local variables. 1104 00:57:21,890 --> 00:57:28,530 It's positive index off of RBP, sorry, negative offset 1105 00:57:28,530 --> 00:57:29,580 off of RBP. 1106 00:57:29,580 --> 00:57:33,860 So it pushes those things into the argument, into that 1107 00:57:33,860 --> 00:57:38,080 region, if it needs to use that region. 1108 00:57:38,080 --> 00:57:42,130 Then we actually, once it's done that, we have the call. 1109 00:57:42,130 --> 00:57:46,920 So B calls C which saves the return address for B on the 1110 00:57:46,920 --> 00:57:52,440 stack, so it saves it on the stack, and then transfers 1111 00:57:52,440 --> 00:58:00,120 control to C. So now it starts executing C's code. 1112 00:58:00,120 --> 00:58:01,780 And what does C do? 1113 00:58:01,780 --> 00:58:05,120 So C is going to have to advance these pointers to 1114 00:58:05,120 --> 00:58:08,720 refer to its region rather than B's. 1115 00:58:08,720 --> 00:58:12,770 It does it by saving B's base pointer on the stack. 1116 00:58:12,770 --> 00:58:15,695 So it saves this pointer here so that it can restore it when 1117 00:58:15,695 --> 00:58:16,950 it returns. 1118 00:58:16,950 --> 00:58:21,010 It advances, it sets its new base pointer to be where the 1119 00:58:21,010 --> 00:58:25,850 stack pointer is now and then advances the stack pointer to 1120 00:58:25,850 --> 00:58:29,520 allocate space for C's local variables and linkage blocks. 1121 00:58:29,520 --> 00:58:30,770 Watch, here we go. 1122 00:58:30,770 --> 00:58:36,690 1123 00:58:36,690 --> 00:58:38,900 So that ends up being C's frame. 1124 00:58:38,900 --> 00:58:43,360 So notice that B's frame and C's frame are overlapping in 1125 00:58:43,360 --> 00:58:45,157 the linkage block between them. 1126 00:58:45,157 --> 00:58:49,220 1127 00:58:49,220 --> 00:58:52,800 Now, if a function never performs stack allocations 1128 00:58:52,800 --> 00:58:57,090 except during function calls, there's a great compile time 1129 00:58:57,090 --> 00:59:00,810 optimization that the compiler will often do. 1130 00:59:00,810 --> 00:59:03,350 And what it will do is realize that 1131 00:59:03,350 --> 00:59:06,620 this distance is constant. 1132 00:59:06,620 --> 00:59:10,450 So, therefore, it doesn't need RBP. 1133 00:59:10,450 --> 00:59:15,190 It can just do the math and index everything off of RSP as 1134 00:59:15,190 --> 00:59:19,440 long as RSP is always the same, for example for C, when 1135 00:59:19,440 --> 00:59:21,840 C is executing. 1136 00:59:21,840 --> 00:59:24,400 There's certain C commands like [? alaka ?] 1137 00:59:24,400 --> 00:59:26,310 which changed the stack pointer. 1138 00:59:26,310 --> 00:59:31,040 If you use those, the compiler can't do that optimization. 1139 00:59:31,040 --> 00:59:36,200 But if the storage on the stack never changes for a 1140 00:59:36,200 --> 00:59:39,490 given frame, then it's free to make this optimization. 1141 00:59:39,490 --> 00:59:42,360 So you'll see code where RBP has been optimized away. 1142 00:59:42,360 --> 00:59:45,590 1143 00:59:45,590 --> 00:59:48,340 How about some questions before we go 1144 00:59:48,340 --> 00:59:50,030 on and do an example. 1145 00:59:50,030 --> 00:59:51,443 Yeah? 1146 00:59:51,443 --> 00:59:52,856 AUDIENCE: [INAUDIBLE] 1147 00:59:52,856 --> 00:59:56,624 should there be A's return address, where you try to 1148 00:59:56,624 --> 00:59:59,094 [INAUDIBLE]? 1149 00:59:59,094 --> 01:00:00,390 CHARLES LEISERSON: Up? 1150 01:00:00,390 --> 01:00:02,360 Oh, this should be, sorry, this should 1151 01:00:02,360 --> 01:00:03,310 be A's return address. 1152 01:00:03,310 --> 01:00:04,420 Yes, you're right. 1153 01:00:04,420 --> 01:00:05,670 OK, good, typo. 1154 01:00:05,670 --> 01:00:09,090 1155 01:00:09,090 --> 01:00:10,956 Is somebody catching my typos to-- 1156 01:00:10,956 --> 01:00:14,080 1157 01:00:14,080 --> 01:00:16,640 OK, yep, a good one, that should be A's return address. 1158 01:00:16,640 --> 01:00:18,410 Sorry about that. 1159 01:00:18,410 --> 01:00:20,560 This is B's return address. 1160 01:00:20,560 --> 01:00:21,460 Any other questions? 1161 01:00:21,460 --> 01:00:22,070 That's good. 1162 01:00:22,070 --> 01:00:23,720 That means you understand something. 1163 01:00:23,720 --> 01:00:24,970 Hooray. 1164 01:00:24,970 --> 01:00:29,980 1165 01:00:29,980 --> 01:00:31,750 So let's do an example. 1166 01:00:31,750 --> 01:00:34,450 So here's my fib example. 1167 01:00:34,450 --> 01:00:38,150 And I compiled this with minus oh zero. 1168 01:00:38,150 --> 01:00:41,620 Because when I compiled it with minus oh three, I 1169 01:00:41,620 --> 01:00:44,650 couldn't understand what was going on. 1170 01:00:44,650 --> 01:00:48,000 So I compiled this with minus oh zero, which gives me really 1171 01:00:48,000 --> 01:00:48,910 unoptimized code. 1172 01:00:48,910 --> 01:00:52,150 And that lets me be the compiler optimizer. 1173 01:00:52,150 --> 01:00:54,920 So here's the code that it generates. 1174 01:00:54,920 --> 01:00:56,720 So we can take a look at a few things here. 1175 01:00:56,720 --> 01:00:59,370 First of all is declaring fib to be a global. 1176 01:00:59,370 --> 01:01:00,880 And it's got some other things here. 1177 01:01:00,880 --> 01:01:03,640 I actually took out some of the directives that were in 1178 01:01:03,640 --> 01:01:05,590 here that were irrelevant for our purposes. 1179 01:01:05,590 --> 01:01:07,960 If you actually compile it, there's a lot more directives 1180 01:01:07,960 --> 01:01:10,680 that are stuck in there and a lot more labels and things 1181 01:01:10,680 --> 01:01:16,700 that you don't need to understand it. 1182 01:01:16,700 --> 01:01:18,340 There are two labels here. 1183 01:01:18,340 --> 01:01:24,850 And so you can see here basically what's going on is 1184 01:01:24,850 --> 01:01:29,090 we're first of all doing the advancing of the base pointer 1185 01:01:29,090 --> 01:01:31,530 and advancing the stack pointer here. 1186 01:01:31,530 --> 01:01:34,380 That's doing that operation that I showed you, those of 1187 01:01:34,380 --> 01:01:37,680 moving the base and stack pointer up. 1188 01:01:37,680 --> 01:01:45,170 And then at the end here this is equivalent to doing, a 1189 01:01:45,170 --> 01:01:47,750 leave instruction is equivalent to undoing that. 1190 01:01:47,750 --> 01:01:52,110 So Intel lets you do one leave instruction rather than making 1191 01:01:52,110 --> 01:01:55,580 you put these instructions in every time. 1192 01:01:55,580 --> 01:01:59,230 It's exactly the same thing. 1193 01:01:59,230 --> 01:02:01,590 But in any case, let's just sort of see 1194 01:02:01,590 --> 01:02:02,940 what's going on here. 1195 01:02:02,940 --> 01:02:07,370 So we're pushing some storage. 1196 01:02:07,370 --> 01:02:10,080 This is saving a register here. 1197 01:02:10,080 --> 01:02:14,750 We're then advancing the stack pointer to store 24 bytes of 1198 01:02:14,750 --> 01:02:16,580 temporary storage. 1199 01:02:16,580 --> 01:02:19,950 And then we're start to do some computations here. 1200 01:02:19,950 --> 01:02:22,580 This looks like we're comparing one with something 1201 01:02:22,580 --> 01:02:24,880 and then doing a ja. 1202 01:02:24,880 --> 01:02:27,510 So this is a jump above. 1203 01:02:27,510 --> 01:02:29,370 This is the unsigned version. 1204 01:02:29,370 --> 01:02:32,380 What you're looking is to see, here we say if n is less than 1205 01:02:32,380 --> 01:02:36,450 two, in fact, what it's doing is saying if n is greater than 1206 01:02:36,450 --> 01:02:38,770 one go to L4. 1207 01:02:38,770 --> 01:02:42,280 1208 01:02:42,280 --> 01:02:43,640 So it's actually doing the other one. 1209 01:02:43,640 --> 01:02:47,020 So you can see then L4 is, what happens is the one that 1210 01:02:47,020 --> 01:02:50,200 has the two calls to fib, recursive calls, so that's 1211 01:02:50,200 --> 01:02:52,850 this part of the code, and it's doing that if it's 1212 01:02:52,850 --> 01:02:53,470 greater than one. 1213 01:02:53,470 --> 01:02:58,310 And otherwise it's going to execute these instructions, 1214 01:02:58,310 --> 01:03:01,820 which are basically returning n. 1215 01:03:01,820 --> 01:03:04,020 And so it basically does some computations. 1216 01:03:04,020 --> 01:03:07,510 And then both of them converge here where it moves the 1217 01:03:07,510 --> 01:03:11,490 results and then pops it off and so forth. 1218 01:03:11,490 --> 01:03:13,810 So that's sort of the outline of what's going on there. 1219 01:03:13,810 --> 01:03:16,920 So let's dive in here a little bit and sort of see what's 1220 01:03:16,920 --> 01:03:21,820 going on, see if we can read this a little bit more closely 1221 01:03:21,820 --> 01:03:24,180 and whether we can optimize it. 1222 01:03:24,180 --> 01:03:26,500 So the first thing that I noticed in looking at this is 1223 01:03:26,500 --> 01:03:30,330 look at all this memory addressing that we're doing. 1224 01:03:30,330 --> 01:03:37,070 What do you suppose this thing is, minus 16% RBP? 1225 01:03:37,070 --> 01:03:38,080 So this is the base pointer. 1226 01:03:38,080 --> 01:03:42,340 So this is a local variable because it's a negative offset 1227 01:03:42,340 --> 01:03:43,790 off of the base pointer. 1228 01:03:43,790 --> 01:03:45,250 What do you think it's doing there? 1229 01:03:45,250 --> 01:03:48,720 1230 01:03:48,720 --> 01:03:49,970 What's stored in here? 1231 01:03:49,970 --> 01:03:54,880 1232 01:03:54,880 --> 01:03:58,960 Yeah, this is where n is being stored. 1233 01:03:58,960 --> 01:03:59,790 Because what are we doing? 1234 01:03:59,790 --> 01:04:04,240 We're trying to compare n with one here even though it says 1235 01:04:04,240 --> 01:04:05,470 two up there. 1236 01:04:05,470 --> 01:04:08,970 We're comparing it with one here. 1237 01:04:08,970 --> 01:04:11,460 And so I look at that, and I say, look, I'm comparing it 1238 01:04:11,460 --> 01:04:19,710 with one, then I'm jumping to L4, then I jump to L4 or not. 1239 01:04:19,710 --> 01:04:21,310 And then let's say I don't. 1240 01:04:21,310 --> 01:04:25,320 Well, then the first thing I do is I move n into RAX. 1241 01:04:25,320 --> 01:04:28,530 But wait a minute, I just compared it with that. 1242 01:04:28,530 --> 01:04:29,230 So I'm accessing n again. 1243 01:04:29,230 --> 01:04:31,100 I'm accessing it a third time. 1244 01:04:31,100 --> 01:04:32,930 How about if I try to store that stuff 1245 01:04:32,930 --> 01:04:34,180 in a register instead? 1246 01:04:34,180 --> 01:04:36,520 1247 01:04:36,520 --> 01:04:39,830 So what I did is I picked the RDI register, because that one 1248 01:04:39,830 --> 01:04:44,630 happens to be available, and I said do they, if you look 1249 01:04:44,630 --> 01:04:45,320 here, what did we do? 1250 01:04:45,320 --> 01:04:51,350 We stored RDI, which is the first argument, into memory. 1251 01:04:51,350 --> 01:04:53,270 And then we compared with it in memory. 1252 01:04:53,270 --> 01:04:54,640 Why don't we compare with it in RDI? 1253 01:04:54,640 --> 01:04:57,230 1254 01:04:57,230 --> 01:04:57,520 Right? 1255 01:04:57,520 --> 01:05:02,655 Duh, stupid compiler, well, because I had minus oh zero. 1256 01:05:02,655 --> 01:05:05,610 1257 01:05:05,610 --> 01:05:07,590 OK, so I can do that improvement. 1258 01:05:07,590 --> 01:05:11,140 So what I did was I edited it to put RDI there and RDI here 1259 01:05:11,140 --> 01:05:12,410 and RDI here. 1260 01:05:12,410 --> 01:05:15,990 And I went up and I said, what about RDI here? 1261 01:05:15,990 --> 01:05:17,410 Why didn't I replace that one? 1262 01:05:17,410 --> 01:05:27,130 1263 01:05:27,130 --> 01:05:28,650 No, there's no loop going on here. 1264 01:05:28,650 --> 01:05:29,694 It's recursion. 1265 01:05:29,694 --> 01:05:30,944 AUDIENCE: [INAUDIBLE PHRASE] 1266 01:05:30,944 --> 01:05:33,870 1267 01:05:33,870 --> 01:05:35,690 CHARLES LEISERSON: Yeah, the problem is that when I call 1268 01:05:35,690 --> 01:05:41,020 fib, RDI gets garbaged on me. 1269 01:05:41,020 --> 01:05:45,590 Because RDI is going to be the first argument to-- 1270 01:05:45,590 --> 01:05:48,330 See it's being garbaged here? 1271 01:05:48,330 --> 01:05:51,360 It's garbage as far as my use of it for n. 1272 01:05:51,360 --> 01:05:55,790 It's being used to pass n minus 1 as the argument to the 1273 01:05:55,790 --> 01:05:57,040 recursive call. 1274 01:05:57,040 --> 01:06:02,670 1275 01:06:02,670 --> 01:06:05,500 So I can't replace this one after fib because RDI no 1276 01:06:05,500 --> 01:06:06,420 longer has it. 1277 01:06:06,420 --> 01:06:07,810 Because I had to leave it. 1278 01:06:07,810 --> 01:06:12,350 But even so I went from 5.45 seconds for the original code 1279 01:06:12,350 --> 01:06:14,930 to 4.09 seconds when I compile that just 1280 01:06:14,930 --> 01:06:16,300 with that little change. 1281 01:06:16,300 --> 01:06:19,360 I felt pretty good. 1282 01:06:19,360 --> 01:06:20,200 I felt pretty good. 1283 01:06:20,200 --> 01:06:23,090 So then I wanted more. 1284 01:06:23,090 --> 01:06:23,710 That was fun. 1285 01:06:23,710 --> 01:06:24,960 I wanted more. 1286 01:06:24,960 --> 01:06:27,570 1287 01:06:27,570 --> 01:06:30,660 So what was the next thing I noticed? 1288 01:06:30,660 --> 01:06:31,720 I noticed that-- 1289 01:06:31,720 --> 01:06:34,260 And by the way almost all the things, that 1290 01:06:34,260 --> 01:06:35,620 stuff I did last night. 1291 01:06:35,620 --> 01:06:37,870 This is what I did an hour before class so we'll see 1292 01:06:37,870 --> 01:06:39,960 whether it-- 1293 01:06:39,960 --> 01:06:44,260 So then I noticed that, look, we're moving this stuff here. 1294 01:06:44,260 --> 01:06:46,420 We keep using minus 24. 1295 01:06:46,420 --> 01:06:48,870 And once again, memory operations are expensive 1296 01:06:48,870 --> 01:06:50,290 compared to register operations. 1297 01:06:50,290 --> 01:06:51,340 Let me try to get rid them. 1298 01:06:51,340 --> 01:06:52,850 What do you suppose is in here? 1299 01:06:52,850 --> 01:06:59,160 1300 01:06:59,160 --> 01:07:08,990 So look, we move RAX into the local variable minus 24. 1301 01:07:08,990 --> 01:07:11,310 And then we jump to L5. 1302 01:07:11,310 --> 01:07:13,370 And we move minus 24 into RAX. 1303 01:07:13,370 --> 01:07:16,950 1304 01:07:16,950 --> 01:07:19,490 That seems kind of unnecessary. 1305 01:07:19,490 --> 01:07:22,470 Here we move RBX into minus 24. 1306 01:07:22,470 --> 01:07:26,490 Then we move minus 24 into RAX. 1307 01:07:26,490 --> 01:07:29,350 What is this value first of all that I'm storing there? 1308 01:07:29,350 --> 01:07:35,840 1309 01:07:35,840 --> 01:07:39,050 What's going to be in RAX at the very end? 1310 01:07:39,050 --> 01:07:42,730 1311 01:07:42,730 --> 01:07:46,620 RAX is the return value. 1312 01:07:46,620 --> 01:07:50,010 So I'm trying to save, here in this case, this is the branch 1313 01:07:50,010 --> 01:07:52,010 where I just want to return n. 1314 01:07:52,010 --> 01:07:55,185 I just want to put RAX to have it return, be 1315 01:07:55,185 --> 01:07:57,720 in RAX when I return. 1316 01:07:57,720 --> 01:07:58,910 So I've got the value here. 1317 01:07:58,910 --> 01:07:59,440 It's just n. 1318 01:07:59,440 --> 01:08:00,395 It was in RDI. 1319 01:08:00,395 --> 01:08:01,750 It's now in RAX. 1320 01:08:01,750 --> 01:08:04,240 But that's clearly unnecessary. 1321 01:08:04,240 --> 01:08:07,940 Why go put it into memory and then take it back out again? 1322 01:08:07,940 --> 01:08:11,140 And here just put it in RAX directly. 1323 01:08:11,140 --> 01:08:12,760 So that's what I did. 1324 01:08:12,760 --> 01:08:17,180 I basically, instead of moving it here, I changed this 1325 01:08:17,180 --> 01:08:20,180 instruction that said add it and put in RBX, I said, no, 1326 01:08:20,180 --> 01:08:21,810 don't put it in RBX. 1327 01:08:21,810 --> 01:08:27,200 Let's just add RBX into RAX, and then it's right there. 1328 01:08:27,200 --> 01:08:30,800 And this one, get rid of those so that it's now moved into 1329 01:08:30,800 --> 01:08:33,840 RAX and it's in RAX. 1330 01:08:33,840 --> 01:08:35,000 So I did that. 1331 01:08:35,000 --> 01:08:39,319 I dropped to 3.9 seconds. 1332 01:08:39,319 --> 01:08:40,630 That felt pretty good, too. 1333 01:08:40,630 --> 01:08:45,229 In addition, I got rid of this extra variable. 1334 01:08:45,229 --> 01:08:49,600 So now I could actually reduce my storage requirements. 1335 01:08:49,600 --> 01:08:53,000 However, when I measured it with this being 24 and not 1336 01:08:53,000 --> 01:08:55,970 being 24, it was the same speed. 1337 01:08:55,970 --> 01:08:58,870 So it's like, eh, but I didn't want to 1338 01:08:58,870 --> 01:09:02,260 waste the storage anyway. 1339 01:09:02,260 --> 01:09:05,200 So then I looked a little bit further. 1340 01:09:05,200 --> 01:09:10,660 And I noticed that I want to get rid of 1341 01:09:10,660 --> 01:09:13,750 this access to n here. 1342 01:09:13,750 --> 01:09:17,450 So basically I'm subtracting it, and I'm storing n. 1343 01:09:17,450 --> 01:09:18,569 How can I get rid of it? 1344 01:09:18,569 --> 01:09:21,020 And this took me a little while to figure out. 1345 01:09:21,020 --> 01:09:23,090 What I realized is, look, we're storing 1346 01:09:23,090 --> 01:09:27,399 stuff away in RBX. 1347 01:09:27,399 --> 01:09:31,770 We have RBX as an available register because I saved the 1348 01:09:31,770 --> 01:09:34,680 value of RBX with this push instruction there. 1349 01:09:34,680 --> 01:09:36,359 So RBX is an available register. 1350 01:09:36,359 --> 01:09:46,120 We're using it to keep the return value of the 1351 01:09:46,120 --> 01:09:47,370 first call to fib. 1352 01:09:47,370 --> 01:09:49,510 1353 01:09:49,510 --> 01:09:52,540 So I'm going to use it for their first call to fib so 1354 01:09:52,540 --> 01:09:55,000 that when I make the second call to fib I can then add the 1355 01:09:55,000 --> 01:09:56,250 two things together. 1356 01:09:56,250 --> 01:09:59,360 1357 01:09:59,360 --> 01:10:04,030 Well, how about if before the first call the fib, why don't 1358 01:10:04,030 --> 01:10:08,150 I use it to store the value of n and then use it to store the 1359 01:10:08,150 --> 01:10:15,810 value of the return value of fib of n minus 1? 1360 01:10:15,810 --> 01:10:17,590 So I did that. 1361 01:10:17,590 --> 01:10:22,640 And that took a little bit of moving things 1362 01:10:22,640 --> 01:10:23,550 around a little bit. 1363 01:10:23,550 --> 01:10:26,380 But I managed to get rid of it by using RBX for two different 1364 01:10:26,380 --> 01:10:31,240 purposes, one to store the temporary, the value of n, and 1365 01:10:31,240 --> 01:10:39,010 the other to store the return value when I need it. 1366 01:10:39,010 --> 01:10:40,300 And when I did that, I got it all the way 1367 01:10:40,300 --> 01:10:44,640 down to 3.61 seconds. 1368 01:10:44,640 --> 01:10:47,380 I actually ran it with minus oh three, 1369 01:10:47,380 --> 01:10:49,650 took about two seconds. 1370 01:10:49,650 --> 01:10:51,720 So I think I can keep my day job. 1371 01:10:51,720 --> 01:10:54,280 1372 01:10:54,280 --> 01:10:57,070 But kind of fun to go in and sort of see what are the 1373 01:10:57,070 --> 01:10:58,500 things that can be done. 1374 01:10:58,500 --> 01:11:00,520 And you can get a very good sense of what's going on. 1375 01:11:00,520 --> 01:11:03,000 The more important thing is when you look at compilers 1376 01:11:03,000 --> 01:11:05,280 generating your code, as we saw on the last lecture on 1377 01:11:05,280 --> 01:11:10,480 profiling, you can see, oh, it did something silly here. 1378 01:11:10,480 --> 01:11:12,000 So you can actually go and say, oh, it's 1379 01:11:12,000 --> 01:11:13,130 doing something silly. 1380 01:11:13,130 --> 01:11:14,990 We can do a better job than that. 1381 01:11:14,990 --> 01:11:17,700 Or, oh, I didn't realize I'd declared this an int when in 1382 01:11:17,700 --> 01:11:23,100 fact, if I declared it a unsigned int 64, it actually 1383 01:11:23,100 --> 01:11:25,260 would produce faster, better code. 1384 01:11:25,260 --> 01:11:25,480 Yeah? 1385 01:11:25,480 --> 01:11:26,868 Question? 1386 01:11:26,868 --> 01:11:27,362 AUDIENCE: Sorry. 1387 01:11:27,362 --> 01:11:32,302 So when you said that when you run it with minus oh three, 1388 01:11:32,302 --> 01:11:34,772 what you're just saying is even though you optimized the 1389 01:11:34,772 --> 01:11:35,266 [INAUDIBLE]? 1390 01:11:35,266 --> 01:11:38,420 CHARLES LEISERSON: As the compiler, that's why I said I 1391 01:11:38,420 --> 01:11:39,670 can keep my day job. 1392 01:11:39,670 --> 01:11:43,060 1393 01:11:43,060 --> 01:11:46,650 So simple optimization strategies, if you're playing 1394 01:11:46,650 --> 01:11:49,090 with things, is you try to keep values and registers to 1395 01:11:49,090 --> 01:11:51,320 eliminate excess memory traffic. 1396 01:11:51,320 --> 01:11:55,550 You can optimize naive function call linkage. 1397 01:11:55,550 --> 01:11:58,790 And the most important thing probably is constant fold. 1398 01:11:58,790 --> 01:12:03,690 Look to see where you've got constants that 1399 01:12:03,690 --> 01:12:05,060 can be combined together. 1400 01:12:05,060 --> 01:12:07,510 There are other optimizations that compilers do like common 1401 01:12:07,510 --> 01:12:10,300 subexpression elimination and so forth. 1402 01:12:10,300 --> 01:12:12,110 But these are sort of the ones, if you're doing it by 1403 01:12:12,110 --> 01:12:14,870 hand, these are sort of things to focus on, particularly 1404 01:12:14,870 --> 01:12:20,690 number one, just get rid of excess memory traffic. 1405 01:12:20,690 --> 01:12:23,120 Let me say, by the way, in doing this I also went down a 1406 01:12:23,120 --> 01:12:25,670 bunch of dead ends, things that I said, oh, this should 1407 01:12:25,670 --> 01:12:28,840 definitely save, and then it was slower. 1408 01:12:28,840 --> 01:12:31,700 And then we look at it, and it turns out, oh, my branch 1409 01:12:31,700 --> 01:12:34,410 misprediction rate is going way up and so forth. 1410 01:12:34,410 --> 01:12:35,540 That's why you have a profiler. 1411 01:12:35,540 --> 01:12:36,830 Because you don't want to do this blind. 1412 01:12:36,830 --> 01:12:39,780 1413 01:12:39,780 --> 01:12:44,860 Now, how does the compiler compile some common high level 1414 01:12:44,860 --> 01:12:46,110 structures? 1415 01:12:46,110 --> 01:12:48,130 1416 01:12:48,130 --> 01:12:52,590 So if you have a conditional, for example, if p, do the 1417 01:12:52,590 --> 01:12:56,360 ctrue clause, else do the cfalse clause, what it does 1418 01:12:56,360 --> 01:12:59,910 basically is it generates instructions to evaluate p. 1419 01:12:59,910 --> 01:13:02,570 And then it does a jump with the condition to see if p is 1420 01:13:02,570 --> 01:13:06,650 false to the else clause and executes those instruction. 1421 01:13:06,650 --> 01:13:11,730 And then otherwise it passes through, does the true clause, 1422 01:13:11,730 --> 01:13:12,930 and then jumps to the end. 1423 01:13:12,930 --> 01:13:16,800 And you'll see that pattern in the code when you look at it. 1424 01:13:16,800 --> 01:13:18,550 So that's a very common pattern for doing 1425 01:13:18,550 --> 01:13:21,370 conditionals. 1426 01:13:21,370 --> 01:13:24,660 Compiling while loops is kind of interesting because most 1427 01:13:24,660 --> 01:13:27,520 while loops start out with a jump. 1428 01:13:27,520 --> 01:13:29,440 So here are the instructions for the body 1429 01:13:29,440 --> 01:13:30,810 of the while loop. 1430 01:13:30,810 --> 01:13:31,850 And here's the test. 1431 01:13:31,850 --> 01:13:35,280 And what they usually do is they jump to the test, they 1432 01:13:35,280 --> 01:13:37,650 evaluate the condition, and if it's true, 1433 01:13:37,650 --> 01:13:38,460 they jump to the loop. 1434 01:13:38,460 --> 01:13:40,470 Otherwise, they fall through. 1435 01:13:40,470 --> 01:13:43,740 And then they go back, do the loop sometimes for the first 1436 01:13:43,740 --> 01:13:44,460 time, et cetera. 1437 01:13:44,460 --> 01:13:48,700 So that's kind of the pattern for a while loop. 1438 01:13:48,700 --> 01:13:51,900 For a for loop, they basically just convert it 1439 01:13:51,900 --> 01:13:53,610 into a while loop. 1440 01:13:53,610 --> 01:13:56,530 You basically take the initialization code. 1441 01:13:56,530 --> 01:13:58,030 You execute that. 1442 01:13:58,030 --> 01:14:00,780 Then while the condition is true, you do the code followed 1443 01:14:00,780 --> 01:14:03,150 by whatever the next code is. 1444 01:14:03,150 --> 01:14:08,510 And so it ends up converting for loops into while loops. 1445 01:14:08,510 --> 01:14:13,230 Now, arrays are, how do we go about implementing data types? 1446 01:14:13,230 --> 01:14:18,670 Arrays are just blocks of memory. 1447 01:14:18,670 --> 01:14:23,440 So you can have basically three different types of array 1448 01:14:23,440 --> 01:14:26,050 depending upon where it gets allocated, either allocated in 1449 01:14:26,050 --> 01:14:28,830 the data segment, allocate on the heap, or 1450 01:14:28,830 --> 01:14:30,080 allocated on the stack. 1451 01:14:30,080 --> 01:14:32,830 1452 01:14:32,830 --> 01:14:35,090 Sometimes even the static arrays these days can be 1453 01:14:35,090 --> 01:14:37,600 allocated in the code segment, if you're not 1454 01:14:37,600 --> 01:14:40,010 going to change them. 1455 01:14:40,010 --> 01:14:47,600 So one thing is to understand that arrays and pointers are 1456 01:14:47,600 --> 01:14:50,580 almost the same thing. 1457 01:14:50,580 --> 01:14:53,820 If you have an array, that's a pointer to a place in memory 1458 01:14:53,820 --> 01:14:56,360 where the array begins. 1459 01:14:56,360 --> 01:15:00,260 And a zero is the same as the value you get when you 1460 01:15:00,260 --> 01:15:04,210 dereference the pointer to a. 1461 01:15:04,210 --> 01:15:08,300 A pointer, if you think about it, is actually just an index 1462 01:15:08,300 --> 01:15:10,970 into the array of all memory. 1463 01:15:10,970 --> 01:15:13,920 And the hardware allows you to index into the 1464 01:15:13,920 --> 01:15:14,890 array of all memory. 1465 01:15:14,890 --> 01:15:18,530 Well, it can also allow you to index into any subregion of 1466 01:15:18,530 --> 01:15:19,570 that memory. 1467 01:15:19,570 --> 01:15:21,740 And that's why arrays and pointers are basically the 1468 01:15:21,740 --> 01:15:22,990 same thing. 1469 01:15:22,990 --> 01:15:25,200 1470 01:15:25,200 --> 01:15:26,400 Here's a little quiz. 1471 01:15:26,400 --> 01:15:28,900 What is eight of a? 1472 01:15:28,900 --> 01:15:32,820 1473 01:15:32,820 --> 01:15:34,300 AUDIENCE: The a elements? 1474 01:15:34,300 --> 01:15:38,230 CHARLES LEISERSON: Yeah, it's basically a of eight. 1475 01:15:38,230 --> 01:15:41,720 It's basically a of eight because the addressing that's 1476 01:15:41,720 --> 01:15:43,540 going on is essentially the same. 1477 01:15:43,540 --> 01:15:45,600 Even though we prefer to write it-- 1478 01:15:45,600 --> 01:15:48,830 If you start writing code like this, I guarantee that at some 1479 01:15:48,830 --> 01:15:52,355 companies they'll get very angry at you even though you 1480 01:15:52,355 --> 01:15:55,910 say, it's the same thing. 1481 01:15:55,910 --> 01:15:59,890 But what's going on is you're actually taking the base, the 1482 01:15:59,890 --> 01:16:02,570 address of a, you're adding eight to it, and then 1483 01:16:02,570 --> 01:16:05,130 dereferencing that value. 1484 01:16:05,130 --> 01:16:08,680 And indeed, even in C, they actually do all the coercions, 1485 01:16:08,680 --> 01:16:11,340 even if eight is a different type, it actually does the 1486 01:16:11,340 --> 01:16:13,260 coercions properly so that they are 1487 01:16:13,260 --> 01:16:14,620 actually the same thing. 1488 01:16:14,620 --> 01:16:17,860 Because it does them after it's converted it into a 1489 01:16:17,860 --> 01:16:22,090 dereference of a plus 8. 1490 01:16:22,090 --> 01:16:26,290 So it's kind of interesting that it works right even 1491 01:16:26,290 --> 01:16:29,420 though when I looked at that I say, well, what if it's bytes 1492 01:16:29,420 --> 01:16:32,560 verses words and so forth? 1493 01:16:32,560 --> 01:16:32,750 Yeah? 1494 01:16:32,750 --> 01:16:34,004 Question? 1495 01:16:34,004 --> 01:16:37,270 AUDIENCE: Will there be a performance difference between 1496 01:16:37,270 --> 01:16:41,450 putting data on those three different arrays? 1497 01:16:41,450 --> 01:16:43,140 CHARLES LEISERSON: Yes, there can be. 1498 01:16:43,140 --> 01:16:46,730 In particular, static array, it knows exactly where the 1499 01:16:46,730 --> 01:16:48,770 base pointer is as a constant. 1500 01:16:48,770 --> 01:16:51,470 Whereas the others it has to actually figure out where it 1501 01:16:51,470 --> 01:16:53,600 is in the heap you need a pointer to it 1502 01:16:53,600 --> 01:16:54,220 to dereference it. 1503 01:16:54,220 --> 01:16:57,716 It can't put it right into the instruction stream itself. 1504 01:16:57,716 --> 01:16:58,966 AUDIENCE: [INAUDIBLE] 1505 01:16:58,966 --> 01:17:01,524 1506 01:17:01,524 --> 01:17:03,966 those would be just constant [INAUDIBLE], right? 1507 01:17:03,966 --> 01:17:08,746 So I can either put that in the static array or-- 1508 01:17:08,746 --> 01:17:10,940 CHARLES LEISERSON: Yes, generally it's faster to have 1509 01:17:10,940 --> 01:17:14,430 it in a static array if you can. 1510 01:17:14,430 --> 01:17:16,740 I want to finish up here so that-- 1511 01:17:16,740 --> 01:17:18,250 We have structs. 1512 01:17:18,250 --> 01:17:21,470 Structs are just blocks of memory also. 1513 01:17:21,470 --> 01:17:23,190 So you can have a bunch of things here. 1514 01:17:23,190 --> 01:17:27,850 This is a bad way to declare a struct because the fields are 1515 01:17:27,850 --> 01:17:29,660 stored next to each other generally in the 1516 01:17:29,660 --> 01:17:32,240 order you give them. 1517 01:17:32,240 --> 01:17:35,470 So here it says it's x and then i and double. 1518 01:17:35,470 --> 01:17:37,740 What happens here is you have to be careful 1519 01:17:37,740 --> 01:17:39,930 about alignment issues. 1520 01:17:39,930 --> 01:17:41,400 So if you do [? char, ?] 1521 01:17:41,400 --> 01:17:44,330 it's then got to pad it out to get to the next 1522 01:17:44,330 --> 01:17:45,740 alignment for an int. 1523 01:17:45,740 --> 01:17:47,460 And it's got to pad that out to get the next 1524 01:17:47,460 --> 01:17:48,970 alignment for a double. 1525 01:17:48,970 --> 01:17:51,380 Whereas if you do it in the opposite order, it starts 1526 01:17:51,380 --> 01:17:52,580 packing them. 1527 01:17:52,580 --> 01:17:55,220 So generally it's best to declare longer fields before 1528 01:17:55,220 --> 01:17:57,915 shorter fields because then you know when you're finished 1529 01:17:57,915 --> 01:17:59,410 with the longer fields, you're already 1530 01:17:59,410 --> 01:18:00,880 aligned for shorter fields. 1531 01:18:00,880 --> 01:18:03,660 1532 01:18:03,660 --> 01:18:06,430 Like arrays, there are static, dynamic, and local structs. 1533 01:18:06,430 --> 01:18:07,340 So that's all they are. 1534 01:18:07,340 --> 01:18:08,965 And so you'll see in the indexing-- 1535 01:18:08,965 --> 01:18:11,590 1536 01:18:11,590 --> 01:18:13,600 There's also stuff that-- 1537 01:18:13,600 --> 01:18:18,760 and actually this is important for one of the binary puzzles 1538 01:18:18,760 --> 01:18:21,960 we gave to figure out what it does-- 1539 01:18:21,960 --> 01:18:23,950 there are what are called SIMD instruction. 1540 01:18:23,950 --> 01:18:27,220 This is single instruction multiple data instructions 1541 01:18:27,220 --> 01:18:29,740 where a single instruction operates on 1542 01:18:29,740 --> 01:18:32,980 multiple pieces of data. 1543 01:18:32,980 --> 01:18:34,550 They operate on smaller vectors. 1544 01:18:34,550 --> 01:18:40,570 And there are 16 128-bit XMM registers, which you can view 1545 01:18:40,570 --> 01:18:44,160 as two 64-bit values or four 32-bit values. 1546 01:18:44,160 --> 01:18:45,860 And you can do an operation on it. 1547 01:18:45,860 --> 01:18:48,520 So this is used for multimedia, for streaming 1548 01:18:48,520 --> 01:18:51,180 applications, and so forth where you're trying to shove a 1549 01:18:51,180 --> 01:18:54,330 lot of data through and you're doing the same repeated stuff 1550 01:18:54,330 --> 01:18:56,270 on the things at time. 1551 01:18:56,270 --> 01:18:59,160 So there are instructions that operate on multiple values. 1552 01:18:59,160 --> 01:19:04,750 For example, here we're moving four 32-bit ints into this 1553 01:19:04,750 --> 01:19:07,370 particular XMM register. 1554 01:19:07,370 --> 01:19:10,650 And similarly here's another one, we're adding it. 1555 01:19:10,650 --> 01:19:12,750 And you can look at the manual for these. 1556 01:19:12,750 --> 01:19:18,740 So you may come across these because they're using up parts 1557 01:19:18,740 --> 01:19:19,450 of the machine. 1558 01:19:19,450 --> 01:19:22,990 Mostly those are the kinds of things we can say look it up 1559 01:19:22,990 --> 01:19:24,790 in the manual, because nobody's going to remember all 1560 01:19:24,790 --> 01:19:25,730 those instructions. 1561 01:19:25,730 --> 01:19:32,130 Of course, if you get a job with one of the graphics 1562 01:19:32,130 --> 01:19:35,760 companies, then you may become very familiar with these kinds 1563 01:19:35,760 --> 01:19:38,080 of instructions. 1564 01:19:38,080 --> 01:19:41,800 There's a lot more C and C++ constructs that we don't have 1565 01:19:41,800 --> 01:19:42,750 to go into. 1566 01:19:42,750 --> 01:19:46,450 You can have arrays of structs versus structs of arrays. 1567 01:19:46,450 --> 01:19:48,125 And there can be a difference in performance. 1568 01:19:48,125 --> 01:19:50,860 1569 01:19:50,860 --> 01:19:56,710 If you have an array of structs, then it makes it, if 1570 01:19:56,710 --> 01:19:59,530 you're accessing one struct, you can access the other 1571 01:19:59,530 --> 01:20:01,880 structs very easily. 1572 01:20:01,880 --> 01:20:06,590 But if you're using things like the SSC instructions, 1573 01:20:06,590 --> 01:20:09,970 then it may be better to have structs of arrays. 1574 01:20:09,970 --> 01:20:12,140 Because then when you're access an array, you can 1575 01:20:12,140 --> 01:20:17,160 stream what's called a stride of one, a regular stride of 1576 01:20:17,160 --> 01:20:20,170 just one memory location after the next, to do the 1577 01:20:20,170 --> 01:20:22,310 processing. 1578 01:20:22,310 --> 01:20:25,090 So the hardware doesn't work as well if you skip by 1579 01:20:25,090 --> 01:20:29,140 seventeens to gather things compared if you just get one 1580 01:20:29,140 --> 01:20:29,980 thing after the next. 1581 01:20:29,980 --> 01:20:32,590 Because there's prefetching logic that tries to fetch 1582 01:20:32,590 --> 01:20:34,620 things from memory faster. 1583 01:20:34,620 --> 01:20:36,570 There are things like function pointers. 1584 01:20:36,570 --> 01:20:40,380 So you can have store function pointer into something and 1585 01:20:40,380 --> 01:20:44,720 then call that function indirectly. 1586 01:20:44,720 --> 01:20:47,310 There's things like bit fields in arrays. 1587 01:20:47,310 --> 01:20:49,460 There are objects, virtual function tables. 1588 01:20:49,460 --> 01:20:52,080 We'll get into some of these when we do C++. 1589 01:20:52,080 --> 01:20:54,170 And there's a variety of stuff having to do with memory 1590 01:20:54,170 --> 01:20:56,250 management that we'll talk about. 1591 01:20:56,250 --> 01:20:59,890 But this is mainly to get you folks sort of at the level 1592 01:20:59,890 --> 01:21:01,810 where you can sort of understand and feel 1593 01:21:01,810 --> 01:21:05,170 comfortable with dealing with the assembler. 1594 01:21:05,170 --> 01:21:07,140 And you'll see that those resources 1595 01:21:07,140 --> 01:21:08,440 are pretty good resources. 1596 01:21:08,440 --> 01:21:12,440 But the basics are relatively simple, but it's hard to do it 1597 01:21:12,440 --> 01:21:18,110 without a manual or some online reference material. 1598 01:21:18,110 --> 01:21:19,360 Any questions? 1599 01:21:19,360 --> 01:21:22,500 1600 01:21:22,500 --> 01:21:26,770 What are the first two lessons I taught you today? 1601 01:21:26,770 --> 01:21:28,826 Number one is-- 1602 01:21:28,826 --> 01:21:30,260 AUDIENCE: [INAUDIBLE] 1603 01:21:30,260 --> 01:21:32,880 CHARLES LEISERSON: --write tests before you write code. 1604 01:21:32,880 --> 01:21:34,990 And what's the second lesson? 1605 01:21:34,990 --> 01:21:35,750 AUDIENCE: Pair programming. 1606 01:21:35,750 --> 01:21:37,360 CHARLES LEISERSON: Pair programming, not divide and 1607 01:21:37,360 --> 01:21:41,060 conquer, I teach algorithms where divide and conquer is a 1608 01:21:41,060 --> 01:21:43,040 fabulous technique. 1609 01:21:43,040 --> 01:21:45,650 With programming, pair programming is going to have 1610 01:21:45,650 --> 01:21:49,710 you generally get where you want to get faster than if 1611 01:21:49,710 --> 01:21:51,180 you're programming alone. 1612 01:21:51,180 --> 01:21:54,915 1613 01:21:54,915 --> 01:21:56,750 OK, thank you. 1614 01:21:56,750 --> 01:21:58,720